Reward Shaping via Diffusion Process in Reinforcement Learning (2306.11885v1)
Abstract: Reinforcement Learning (RL) models have continually evolved to navigate the exploration - exploitation trade-off in uncertain Markov Decision Processes (MDPs). In this study, I leverage the principles of stochastic thermodynamics and system dynamics to explore reward shaping via diffusion processes. This provides an elegant framework as a way to think about exploration-exploitation trade-off. This article sheds light on relationships between information entropy, stochastic system dynamics, and their influences on entropy production. This exploration allows us to construct a dual-pronged framework that can be interpreted as either a maximum entropy program for deriving efficient policies or a modified cost optimization program accounting for informational costs and benefits. This work presents a novel perspective on the physical nature of information and its implications for online learning in MDPs, consequently providing a better understanding of information-oriented formulations in RL.
- Refined second law of thermodynamics for fast random processes. Journal of statistical physics, 147(3):487–505, 2012.
- C. H. Bennett. Demons, engines and the second law. Scientific American, 257(5):108–116, 1987.
- D. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, Nashua, NH, USA, 3rd edition, 2005.
- L. Boltzmann. The second law of thermodynamics. In Theoretical physics and philosophical problems, pages 13–32. Springer, 1974.
- R. Brockett and J. Willems. Stochastic control and the second law of thermodynamics. In Decision and Control including the 17th Symposium on Adaptive Processes, 1978 IEEE Conference on, volume 17, pages 1007–1011. IEEE, 1979.
- The Art and Theory of Dynamic Programming. Academic Press, New York, NY, USA, 1st edition, 1977.
- A. Ghate. Optimal minimum bids and inventory scrapping in sequential, single-unit, vickrey auctions with demand learning. European Journal of Operational Research, 245(2):555–570, 2015.
- S. Ito and T. Sagawa. Information flow and entropy production on bayesian networks. Mathematical Foundations and Applications of Graph Entropy, 3:2, 2016.
- E. T. Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957.
- J. Kotas and A. Ghate. Response-guided dosing for rheumatoid arthritis. IIE Transactions on Healthcare Systems Engineering, 6(1):1–21, 2016.
- V. Krishnamurthy. Partially observed Markov decision processes. Cambridge University Press, Cambridge, United Kingdom, 1st edition, 2016.
- P. R. Kumar. A survey of some results in stochastic adaptive control. SIAM Journal on Control and Optimization, 23(3):329–380, 1985.
- Stochastic Systems: Estimation, Identification, and Adaptive Control. SIAM, Philadelphia, PA, USA, 2016.
- R. Landauer. Irreversibility and heat generation in the computing process. IBM journal of research and development, 5(3):183–191, 1961.
- R. Landauer. Information is physical. Physics Today, 44(5):23–29, 1991.
- J. C. Maxwell. Theory of heat. Longmans, 1921.
- M. B. Propp. The thermodynamic properties of Markov processes. PhD thesis, Massachusetts Institute of Technology, 1985.
- M. L. Puterman. Markov decision processes : Discrete stochastic dynamic programming. John Wiley and Sons, New York, NY, USA, 1994.
- G. N. Saridis. Entropy formulation of optimal and adaptive control. IEEE Transactions on Automatic Control, 33(8):713–721, 1988.
- U. Seifert. Entropy production along a stochastic trajectory and an integral fluctuation theorem. Physical review letters, 95(4):040602, 2005.
- Stochastic thermodynamics: An introduction. In AIP Conference Proceedings, volume 1332, pages 56–76, 2011.
- L. Szilárd. On entropy reduction in a thermodynamic system by interference by intelligent subjects. Zhurnal Physik, 53, 1976.
- Finite state markov decision processes with transfer entropy costs. arXiv preprint arXiv:1708.09096, 2017.
- E. A. Theodorou. Nonlinear stochastic control and information theoretic dualities: Connections, interdependencies and thermodynamic interpretations. Entropy, 17(5):3352–3375, 2015.
- E. A. Theodorou and E. Todorov. Relative entropy and free energy dualities: Connections to path integral and kl control. In Decision and Control (CDC), 2012 IEEE 51st Annual Conference on, pages 1466–1473. IEEE, 2012.
- E. Todorov. Efficient computation of optimal actions. Proceedings of the national academy of sciences, 106(28):11478–11483, 2009.
- N. Wolchover. The quantum thermodynamics revolution, 2017. Accessed: 2017-05-05.