Generalized Maximum Entropy Differential Dynamic Programming (2403.18130v2)
Abstract: We present a sampling-based trajectory optimization method derived from the maximum entropy formulation of Differential Dynamic Programming with Tsallis entropy. This method is a generalization of the legacy work with Shannon entropy, which leads to a Gaussian optimal control policy for exploration during optimization. With the Tsallis entropy, the policy takes the form of $q$-Gaussian, which further encourages exploration with its heavy-tailed shape. Moreover, the sampling variance is scaled according to the value function of the trajectory. This scaling mechanism is the unique property of the algorithm with Tsallis entropy in contrast to the original formulation with Shannon entropy, which scales variance with a fixed temperature parameter. Due to this property, our proposed algorithms can promote exploration when necessary, that is, the cost of the trajectory is high. The simulation results with two robotic systems with multimodal cost demonstrate the properties of the proposed algorithm.
- C. Tsallis, “Possible generalization of boltzmann-gibbs statistics,” Journal of Statistical Physics, vol. 52, no. 1, pp. 479–487, Jul 1988. [Online]. Available: https://doi.org/10.1007/BF01016429
- C. Tsallis, M. Gell-Mann, and Y. Sato, “Asymptotically scale-invariant occupancy of phase space makes the entropy sq extensive,” Proc Natl Acad Sci U S A, vol. 102, no. 43, pp. 15 377–15 382, Oct. 2005.
- S. Umarov, C. Tsallis, and S. Steinberg, “On a q-central limit theorem consistent with nonextensive statistical mechanics,” Milan Journal of Mathematics, vol. 76, no. 1, pp. 307–328, Dec 2008. [Online]. Available: https://doi.org/10.1007/s00032-008-0087-y
- E. Lutz, “Anomalous diffusion and tsallis statistics in an optical lattice,” Phys. Rev. A, vol. 67, p. 051402, May 2003. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.67.051402
- B. Liu and J. Goree, “Superdiffusion and non-gaussian statistics in a driven-dissipative 2d dusty plasma,” Phys. Rev. Lett., vol. 100, p. 055003, Feb 2008. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.100.055003
- C. Tsallis, S. V. F. Levy, A. M. C. Souza, and R. Maynard, “Statistical-mechanical foundation of the ubiquity of lévy distributions in nature,” Phys. Rev. Lett., vol. 75, pp. 3589–3593, Nov 1995. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.75.3589
- D. Prato and C. Tsallis, “Nonextensive foundation of lévy distributions,” Phys. Rev. E, vol. 60, pp. 2398–2401, Aug 1999. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.60.2398
- N. Inoue and K. Shinoda, “q-gaussian mixture models for image and video semantic indexing,” Journal of Visual Communication and Image Representation, vol. 24, no. 8, pp. 1450–1457, 2013. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1047320313001855
- D. Ghoshdastidar, A. Dukkipati, and S. Bhatnagar, “q-gaussian based smoothed functional algorithms for stochastic optimization,” in 2012 IEEE International Symposium on Information Theory Proceedings, 2012, pp. 1059–1063.
- R. Tinós and S. Yang, “Use of the q-gaussian mutation in evolutionary algorithms,” Soft Computing, vol. 15, no. 8, pp. 1523–1549, Aug 2011. [Online]. Available: https://doi.org/10.1007/s00500-010-0686-8
- T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, p. 1352–1361. [Online]. Available: https://api.semanticscholar.org/CorpusID:11227891
- B. D. Ziebart, “Modeling purposeful adaptive behavior with the principle of maximum causal entropy,” Ph.D. dissertation, Carnegie Mellon Univ., 2010. [Online]. Available: https://www.cs.cmu.edu/~bziebart/publications/thesis-bziebart.pdf
- G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou, “Information theoretic mpc for model-based reinforcement learning,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 1714–1721. [Online]. Available: https://ieeexplore.ieee.org/document/7989202
- Z. Wang, O. So, J. Gibson, B. Vlahov, M. S. Gandhi, G.-H. Liu, and E. A. Theodorou, “Variational inference mpc using tsallis divergence,” in Robotics Science and Systems (RSS), 2021. [Online]. Available: https://www.roboticsproceedings.org/rss17/p073.pdf
- G. Chen, Y. Peng, and M. Zhang, “Effective exploration for deep reinforcement learning via bootstrapped q-ensembles under tsallis entropy regularization,” 2018.
- K. Lee, S. Kim, S. Lim, S. Choi, and S. Oh, “Tsallis reinforcement learning: A unified framework for maximum entropy reinforcement learning,” CoRR, vol. abs/1902.00137, 2019. [Online]. Available: http://arxiv.org/abs/1902.00137
- L. Zhu, Z. Chen, E. Uchibe, and T. Matsubara, “Enforcing kl regularization in general tsallis entropy reinforcement learning via advantage learning,” 2022.
- O. So, Z. Wang, and E. A. Theodorou, “Maximum entropy differential dynamic programming,” 2022. [Online]. Available: https://arxiv.org/abs/2110.06451
- L.-Z. Liao and C. Shoemaker, “Convergence in unconstrained discrete-time differential dynamic programming,” IEEE Transactions on Automatic Control, vol. 36, no. 6, pp. 692–706, 1991. [Online]. Available: https://ieeexplore.ieee.org/document/86943
- W. J. Thistleton, J. A. Marsh, K. Nelson, and C. Tsallis, “Generalized box–mÜller method for generating q𝑞qitalic_q-gaussian random deviates,” IEEE Transactions on Information Theory, vol. 53, no. 12, pp. 4805–4810, 2007. [Online]. Available: https://ieeexplore.ieee.org/document/4385787
- C. Vignat and A. Plastino, “Poincaré’s observation and the origin of tsallis generalized canonical distributions,” Physica A: Statistical Mechanics and its Applications, vol. 365, no. 1, pp. 167–172, 2006, fundamental Problems of Modern Statistical Mechanics. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378437106000744
- ——, “Central limit theorem and deformed exponentials,” Journal of Physics A: Mathematical and Theoretical, vol. 40, no. 45, p. F969, oct 2007. [Online]. Available: https://dx.doi.org/10.1088/1751-8113/40/45/F02
- Student, “The probable error of a mean,” Biometrika, vol. 6, no. 1, pp. 1–25, 1908. [Online]. Available: http://www.jstor.org/stable/2331554
- D. Ghoshdastidar, A. Dukkipati, and S. Bhatnagar, “Smoothed functional algorithms for stochastic optimization using q-gaussian distributions,” ACM Trans. Model. Comput. Simul., vol. 24, no. 3, jun 2014. [Online]. Available: https://doi.org/10.1145/2628434
- T. Luukkonen, “Modelling and control of quadcopter,” Independent research project in applied mathematics, Espoo, 2011. [Online]. Available: https://sal.aalto.fi/publicaitons/pdf-files/eluu11_public.pdf
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.