Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms (2102.00815v4)

Published 1 Feb 2021 in cs.LG, cs.AI, and stat.ML

Abstract: Finding the minimal structural assumptions that empower sample-efficient learning is one of the most important research directions in Reinforcement Learning (RL). This paper advances our understanding of this fundamental question by introducing a new complexity measure -- BeLLMan Eluder (BE) dimension. We show that the family of RL problems of low BE dimension is remarkably rich, which subsumes a vast majority of existing tractable RL problems including but not limited to tabular MDPs, linear MDPs, reactive POMDPs, low BeLLMan rank problems as well as low Eluder dimension problems. This paper further designs a new optimization-based algorithm -- GOLF, and reanalyzes a hypothesis elimination-based algorithm -- OLIVE (proposed in Jiang et al., 2017). We prove that both algorithms learn the near-optimal policies of low BE dimension problems in a number of samples that is polynomial in all relevant parameters, but independent of the size of state-action space. Our regret and sample complexity results match or improve the best existing results for several well-known subclasses of low BE dimension problems.

Authors (3)

Chi Jin (90 papers)
Qinghua Liu (33 papers)
Sobhan Miryoosefi (9 papers)

Citations (203)

View on Semantic Scholar

Summary

BeLLMan Eluder Dimension in Reinforcement Learning: A Framework Analysis

The paper introduces the concept of BeLLMan Eluder (BE) dimension as a new complexity measure in reinforcement learning (RL) designed to identify rich classes of RL problems that can be solved sample-efficiently. The authors propose that existing tractable problem classes, notably those with low BeLLMan rank and low Eluder dimension, fall under the umbrella of low BE dimension, hence addressing the long-standing challenge of defining minimal structural assumptions for efficient learning in RL.

Overview of BeLLMan Eluder Dimension

The BE dimension is introduced to aid in understanding what structural properties in RL allow for efficient learning. It generalizes the notion of Eluder dimension to incorporate dynamics specific to RL. The measure considers the discrepancy between estimated and true value functions over distributions generated by roll-in policies, emphasizing the convergence properties of function approximators under BeLLMan updates.

Key Contributions

Unified Framework: The BE dimension unifies previous notions such as BeLLMan rank and Eluder dimension, offering a broader perspective without being too restrictive. It extends to include several RL problem classes like tabular, linear, and certain non-linear function approximators within its scope.
Algorithm Development: The paper proposes an optimization-based algorithm named Golf (Global Optimism with Local Fitting), which operates by maintaining a set of optimistic candidates for the value function and refining it based on observational data. This resembles a computative analogue to Expectation-Maximization tailored for RL tasks.
Analysis of Olive: The paper revisits the Olive algorithm, initially developed for low BeLLMan rank problems, and demonstrates its applicability to the broader class of low BE dimension problems. Olive operates as a hypothesis elimination method, continuing the search until the policy set is narrowed down sufficiently to guarantee performance.
Comprehensive Sample Complexity Results: By showing that both Golf and Olive achieve polynomial sample complexity bounds, the authors argue convincingly for the practicality of BE dimension as a measure. The results indicate that RL problems with low BE dimension can be resolved in sample numbers that scale only with relevant parameters, not the size of the state-action space.

Implications on Reinforcement Learning

The findings imply significant improvements in applying RL to real-world problems with large state spaces. The BE dimension provides a unified framework that allows existing RL approaches to be analyzed under a common metric. This work bridges gaps between methodological theory and practical application by outlining classes of RL problems with theoretically guaranteed sample efficiency.

Future Directions

The paper opens avenues for further research into specific problem classes that exhibit low BE dimension characteristics necessary for efficient learning. Additionally, it poses interesting challenges in algorithmic implementations, especially concerning computational feasibility when dealing with large and complex state and action spaces. Future work could involve exploration beyond deterministic settings into more complex scenarios including partially observable models, further validating the BE dimension in diverse domains.

This framework and its implications offer a valuable groundwork for the development of new algorithms and theoretical insights, emphasizing BE dimension as a critical component in advancing the effectiveness of reinforcement learning applications.

PDF Markdown