Papers
Topics
Authors
Recent
Search
2000 character limit reached

Utilizing Maximum Mean Discrepancy Barycenter for Propagating the Uncertainty of Value Functions in Reinforcement Learning

Published 31 Mar 2024 in cs.LG | (2404.00686v2)

Abstract: Accounting for the uncertainty of value functions boosts exploration in Reinforcement Learning (RL). Our work introduces Maximum Mean Discrepancy Q-Learning (MMD-QL) to improve Wasserstein Q-Learning (WQL) for uncertainty propagation during Temporal Difference (TD) updates. MMD-QL uses the MMD barycenter for this purpose, as MMD provides a tighter estimate of closeness between probability measures than the Wasserstein distance. Firstly, we establish that MMD-QL is Probably Approximately Correct in MDP (PAC-MDP) under the average loss metric. Concerning the accumulated rewards, experiments on tabular environments show that MMD-QL outperforms WQL and other algorithms. Secondly, we incorporate deep networks into MMD-QL to create MMD Q-Network (MMD-QN). Making reasonable assumptions, we analyze the convergence rates of MMD-QN using function approximation. Empirical results on challenging Atari games demonstrate that MMD-QN performs well compared to benchmark deep RL algorithms, highlighting its effectiveness in handling large state-action spaces.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.