Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Generalized Maximum Entropy Differential Dynamic Programming (2403.18130v2)

Published 26 Mar 2024 in math.OC, cs.IT, and math.IT

Abstract: We present a sampling-based trajectory optimization method derived from the maximum entropy formulation of Differential Dynamic Programming with Tsallis entropy. This method is a generalization of the legacy work with Shannon entropy, which leads to a Gaussian optimal control policy for exploration during optimization. With the Tsallis entropy, the policy takes the form of $q$-Gaussian, which further encourages exploration with its heavy-tailed shape. Moreover, the sampling variance is scaled according to the value function of the trajectory. This scaling mechanism is the unique property of the algorithm with Tsallis entropy in contrast to the original formulation with Shannon entropy, which scales variance with a fixed temperature parameter. Due to this property, our proposed algorithms can promote exploration when necessary, that is, the cost of the trajectory is high. The simulation results with two robotic systems with multimodal cost demonstrate the properties of the proposed algorithm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. C. Tsallis, “Possible generalization of boltzmann-gibbs statistics,” Journal of Statistical Physics, vol. 52, no. 1, pp. 479–487, Jul 1988. [Online]. Available: https://doi.org/10.1007/BF01016429
  2. C. Tsallis, M. Gell-Mann, and Y. Sato, “Asymptotically scale-invariant occupancy of phase space makes the entropy sq extensive,” Proc Natl Acad Sci U S A, vol. 102, no. 43, pp. 15 377–15 382, Oct. 2005.
  3. S. Umarov, C. Tsallis, and S. Steinberg, “On a q-central limit theorem consistent with nonextensive statistical mechanics,” Milan Journal of Mathematics, vol. 76, no. 1, pp. 307–328, Dec 2008. [Online]. Available: https://doi.org/10.1007/s00032-008-0087-y
  4. E. Lutz, “Anomalous diffusion and tsallis statistics in an optical lattice,” Phys. Rev. A, vol. 67, p. 051402, May 2003. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.67.051402
  5. B. Liu and J. Goree, “Superdiffusion and non-gaussian statistics in a driven-dissipative 2d dusty plasma,” Phys. Rev. Lett., vol. 100, p. 055003, Feb 2008. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.100.055003
  6. C. Tsallis, S. V. F. Levy, A. M. C. Souza, and R. Maynard, “Statistical-mechanical foundation of the ubiquity of lévy distributions in nature,” Phys. Rev. Lett., vol. 75, pp. 3589–3593, Nov 1995. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.75.3589
  7. D. Prato and C. Tsallis, “Nonextensive foundation of lévy distributions,” Phys. Rev. E, vol. 60, pp. 2398–2401, Aug 1999. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.60.2398
  8. N. Inoue and K. Shinoda, “q-gaussian mixture models for image and video semantic indexing,” Journal of Visual Communication and Image Representation, vol. 24, no. 8, pp. 1450–1457, 2013. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1047320313001855
  9. D. Ghoshdastidar, A. Dukkipati, and S. Bhatnagar, “q-gaussian based smoothed functional algorithms for stochastic optimization,” in 2012 IEEE International Symposium on Information Theory Proceedings, 2012, pp. 1059–1063.
  10. R. Tinós and S. Yang, “Use of the q-gaussian mutation in evolutionary algorithms,” Soft Computing, vol. 15, no. 8, pp. 1523–1549, Aug 2011. [Online]. Available: https://doi.org/10.1007/s00500-010-0686-8
  11. T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17.   JMLR.org, 2017, p. 1352–1361. [Online]. Available: https://api.semanticscholar.org/CorpusID:11227891
  12. B. D. Ziebart, “Modeling purposeful adaptive behavior with the principle of maximum causal entropy,” Ph.D. dissertation, Carnegie Mellon Univ., 2010. [Online]. Available: https://www.cs.cmu.edu/~bziebart/publications/thesis-bziebart.pdf
  13. G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou, “Information theoretic mpc for model-based reinforcement learning,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 1714–1721. [Online]. Available: https://ieeexplore.ieee.org/document/7989202
  14. Z. Wang, O. So, J. Gibson, B. Vlahov, M. S. Gandhi, G.-H. Liu, and E. A. Theodorou, “Variational inference mpc using tsallis divergence,” in Robotics Science and Systems (RSS), 2021. [Online]. Available: https://www.roboticsproceedings.org/rss17/p073.pdf
  15. G. Chen, Y. Peng, and M. Zhang, “Effective exploration for deep reinforcement learning via bootstrapped q-ensembles under tsallis entropy regularization,” 2018.
  16. K. Lee, S. Kim, S. Lim, S. Choi, and S. Oh, “Tsallis reinforcement learning: A unified framework for maximum entropy reinforcement learning,” CoRR, vol. abs/1902.00137, 2019. [Online]. Available: http://arxiv.org/abs/1902.00137
  17. L. Zhu, Z. Chen, E. Uchibe, and T. Matsubara, “Enforcing kl regularization in general tsallis entropy reinforcement learning via advantage learning,” 2022.
  18. O. So, Z. Wang, and E. A. Theodorou, “Maximum entropy differential dynamic programming,” 2022. [Online]. Available: https://arxiv.org/abs/2110.06451
  19. L.-Z. Liao and C. Shoemaker, “Convergence in unconstrained discrete-time differential dynamic programming,” IEEE Transactions on Automatic Control, vol. 36, no. 6, pp. 692–706, 1991. [Online]. Available: https://ieeexplore.ieee.org/document/86943
  20. W. J. Thistleton, J. A. Marsh, K. Nelson, and C. Tsallis, “Generalized box–mÜller method for generating q𝑞qitalic_q-gaussian random deviates,” IEEE Transactions on Information Theory, vol. 53, no. 12, pp. 4805–4810, 2007. [Online]. Available: https://ieeexplore.ieee.org/document/4385787
  21. C. Vignat and A. Plastino, “Poincaré’s observation and the origin of tsallis generalized canonical distributions,” Physica A: Statistical Mechanics and its Applications, vol. 365, no. 1, pp. 167–172, 2006, fundamental Problems of Modern Statistical Mechanics. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378437106000744
  22. ——, “Central limit theorem and deformed exponentials,” Journal of Physics A: Mathematical and Theoretical, vol. 40, no. 45, p. F969, oct 2007. [Online]. Available: https://dx.doi.org/10.1088/1751-8113/40/45/F02
  23. Student, “The probable error of a mean,” Biometrika, vol. 6, no. 1, pp. 1–25, 1908. [Online]. Available: http://www.jstor.org/stable/2331554
  24. D. Ghoshdastidar, A. Dukkipati, and S. Bhatnagar, “Smoothed functional algorithms for stochastic optimization using q-gaussian distributions,” ACM Trans. Model. Comput. Simul., vol. 24, no. 3, jun 2014. [Online]. Available: https://doi.org/10.1145/2628434
  25. T. Luukkonen, “Modelling and control of quadcopter,” Independent research project in applied mathematics, Espoo, 2011. [Online]. Available: https://sal.aalto.fi/publicaitons/pdf-files/eluu11_public.pdf

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube