Provably efficient reinforcement learning with function approximation

Determine whether reinforcement learning algorithms that incorporate function approximation can be designed to be provably efficient in both runtime and sample complexity, with efficiency depending on an intrinsic complexity measure of the function class rather than the number of states, thereby addressing the exploration–exploitation tradeoff in large or infinite state spaces.

Background

The paper motivates the need for provably efficient algorithms in reinforcement learning when using function approximation, noting that tabular methods are infeasible in large state spaces and that function approximation introduces statistical and computational challenges, particularly for exploration.

The authors present an optimistic modification of Least-Squares Value Iteration (LSVI-UCB) and prove polynomial runtime and sample complexity guarantees in the specific setting of linear Markov decision processes (linear transitions and rewards). While this result addresses the question in the linear case, the broader question of whether such provable efficiency can be achieved across the general function approximation setting is posed as an open problem.

References

As a result, a core RL question remains open: how can we design provably efficient RL algorithms that incorporate function approximation? This question persists even in a basic setting with linear dynamics and linear rewards, for which only linear function approximation is needed.