Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Balancing Value Underestimation and Overestimation with Realistic Actor-Critic (2110.09712v6)

Published 19 Oct 2021 in cs.LG

Abstract: Model-free deep reinforcement learning (RL) has been successfully applied to challenging continuous control domains. However, poor sample efficiency prevents these methods from being widely used in real-world domains. This paper introduces a novel model-free algorithm, Realistic Actor-Critic(RAC), which can be incorporated with any off-policy RL algorithms to improve sample efficiency. RAC employs Universal Value Function Approximators (UVFA) to simultaneously learn a policy family with the same neural network, each with different trade-offs between underestimation and overestimation. To learn such policies, we introduce uncertainty punished Q-learning, which uses uncertainty from the ensembling of multiple critics to build various confidence-bounds of Q-function. We evaluate RAC on the MuJoCo benchmark, achieving 10x sample efficiency and 25\% performance improvement on the most challenging Humanoid environment compared to SAC.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sicen Li (2 papers)
  2. Qinyun Tang (1 paper)
  3. Yiming Pang (5 papers)
  4. Xinmeng Ma (1 paper)
  5. Gang Wang (406 papers)
Citations (3)