Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods (2011.02073v5)

Published 4 Nov 2020 in cs.LG, cs.AI, cs.RO, cs.SY, and eess.SY

Abstract: In this paper, a novel optimal control-based baseline function is presented for the policy gradient method in deep reinforcement learning (RL). The baseline is obtained by computing the value function of an optimal control problem, which is formed to be closely associated with the RL task. In contrast to the traditional baseline aimed at variance reduction of policy gradient estimates, our work utilizes the optimal control value function to introduce a novel aspect to the role of baseline -- providing guided exploration during policy learning. This aspect is less discussed in prior works. We validate our baseline on robot learning tasks, showing its effectiveness in guided exploration, particularly in sparse reward environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xubo Lyu (5 papers)
  2. Site Li (15 papers)
  3. Seth Siriya (9 papers)
  4. Ye Pu (28 papers)
  5. Mo Chen (95 papers)

Summary

We haven't generated a summary for this paper yet.