Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tsallis Entropy Regularization for Linearly Solvable MDP and Linear Quadratic Regulator (2403.01805v1)

Published 4 Mar 2024 in math.OC, cs.LG, cs.SY, and eess.SY

Abstract: Shannon entropy regularization is widely adopted in optimal control due to its ability to promote exploration and enhance robustness, e.g., maximum entropy reinforcement learning known as Soft Actor-Critic. In this paper, Tsallis entropy, which is a one-parameter extension of Shannon entropy, is used for the regularization of linearly solvable MDP and linear quadratic regulators. We derive the solution for these problems and demonstrate its usefulness in balancing between exploration and sparsity of the obtained control law.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1861–1870.
  2. B. Eysenbach and S. Levine, “Maximum entropy RL (provably) solves some robust RL problems,” in Proceedings. International Symposium on Information Theory, 2005. ISIT 2005., Oct. 2021.
  3. K. Oishi, Y. Hashizume, T. Jimbo, H. Kaji, and K. Kashima, “Imitation-regularized optimal transport on networks: Provable robustness and application to logistics planning,” Feb. 2024, arXiv:2402.17967 [cs.LG].
  4. H. Bao and S. Sakaue, “Sparse regularized optimal transport with deformed q,” Entropy, vol. 24, no. 11, p. 1634, Nov. 2022.
  5. C. Tsallis, “Possible generalization of Boltzmann-Gibbs statistics,” Journal of Statistical Physics, vol. 52, no. 1, pp. 479–487, Jul. 1988.
  6. K. Lee, S. Choi, and S. Oh, “Sparse markov decision processes with causal sparse Tsallis entropy regularization for reinforcement learning,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466–1473, 2018, publisher: IEEE.
  7. J. Choy, K. Lee, and S. Oh, “Sparse Actor-Critic: Sparse Tsallis entropy regularized reinforcement learning in a continuous action space,” in 2020 17th International Conference on Ubiquitous Robots (UR), Jun. 2020, pp. 68–73, iSSN: 2325-033X.
  8. E. P. Borges, “A possible deformed algebra and calculus inspired in nonextensive thermostatistics,” Physica A: Statistical Mechanics and its Applications, vol. 340, no. 1-3, pp. 95–101, Sep. 2004.
  9. H. Suyari, M. Tsukada, and Y. Uesaka, “Mathematical structures derived from the q-product uniquely determined by Tsallis entropy,” in Proceedings. International Symposium on Information Theory, 2005. ISIT 2005.   Adelaide, Australia: IEEE, 2005, pp. 2364–2368.
  10. S. Furuichi, K. Yanagi, and K. Kuriyama, “Fundamental properties of Tsallis relative entropy,” Journal of Mathematical Physics, vol. 45, no. 12, pp. 4868–4877, Dec. 2004.
  11. C. Vignat and A. Plastino, “Central limit theorem, deformed exponentials and superstatistics,” Jun. 2007, arXiv:0706.0151 [cond-mat].
  12. S. Furuichi, “On the maximum entropy principle and the minimization of the Fisher information in Tsallis statistics,” Journal of Mathematical Physics, vol. 50, no. 1, p. 013303, Jan. 2009.
  13. G. Neu, A. Jonsson, and V. Gómez, “A unified view of entropy-regularized Markov decision processes,” May 2017, arXiv:1705.07798 [cs, stat].
  14. B. Peters, V. Niculae, and A. F. T. Martins, “Sparse sequence-to-sequence models,” Jun. 2019, arXiv:1905.05702 [cs].
  15. K. Ito and K. Kashima, “Kullback–Leibler control for discrete-time nonlinear systems on continuous spaces,” SICE Journal of Control, Measurement, and System Integration, vol. 15, no. 2, pp. 119–129, Jun. 2022.
  16. ——, “Maximum entropy optimal density control of discrete-time linear systems and Schrödinger bridges,” IEEE Transactions on Automatic Control, vol. 69, no. 3, pp. 1536–1551, 2024.
  17. ——, “Maximum entropy density control of discrete-time linear systems with quadratic cost,” Sep. 2023, arXiv:2309.10662 [math.OC].

Summary

We haven't generated a summary for this paper yet.