PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm
Abstract: In this paper, we propose the first fully push-forward-based distributional reinforcement learning algorithm, named PACER, which consists of a distributional critic, a stochastic actor and a sample-based encourager. Specifically, the push-forward operator is leveraged in both the critic and actor to model the return distributions and stochastic policies respectively, enabling them with equal modeling capability and thus enhancing the synergetic performance. Since it is infeasible to obtain the density function of the push-forward policies, novel sample-based regularizers are integrated in the encourager to incentivize efficient exploration and alleviate the risk of trapping into local optima. Moreover, a sample-based stochastic utility value policy gradient is established for the push-forward policy update, which circumvents the explicit demand of the policy density function in existing REINFORCE-based stochastic policy gradient. As a result, PACER fully utilizes the modeling capability of the push-forward operator and is able to explore a broader class of the policy space, compared with limited policy classes used in existing distributional actor critic algorithms (i.e. Gaussians). We validate the critical role of each component in our algorithm with extensive empirical studies. Experimental results demonstrate the superiority of our algorithm over the state-of-the-art.
- Amari SI (1998) Natural gradient works efficiently in learning. Neural computation 10(2):251–276 Armengol Urpà et al [2021] Armengol Urpà N, Curi S, Krause A (2021) Risk-averse offline reinforcement learning. In: ICLR 2021, OpenReview Balbás et al [2009] Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Armengol Urpà N, Curi S, Krause A (2021) Risk-averse offline reinforcement learning. In: ICLR 2021, OpenReview Balbás et al [2009] Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Armengol Urpà N, Curi S, Krause A (2021) Risk-averse offline reinforcement learning. In: ICLR 2021, OpenReview Balbás et al [2009] Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
- Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.