2000 character limit reached
Classical Policy Gradient: Preserving Bellman's Principle of Optimality (1906.03063v1)
Published 6 Jun 2019 in cs.LG and stat.ML
Abstract: We propose a new objective function for finite-horizon episodic Markov decision processes that better captures BeLLMan's principle of optimality, and provide an expression for the gradient of the objective.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.