Establishing the generality of the Nibbler approach beyond constructed environments

Establish the generality of the Nibbler algorithm and the associated scaling methodology for model-free reinforcement learning with unstructured observations by determining whether the observed performance and scaling behavior extend beyond the constructed multi-catch environments used in the experiments.

Background

The paper introduces Nibbler, a model-free reinforcement learning algorithm that constructs and learns general value function (GVF) subproblems to build nonlinear features from unstructured observations. The empirical evaluation focuses on synthetic, combinatorial multi-catch environments designed to expose scalable structure.

While the results show favorable scaling and performance relative to deep RL baselines in these constructed domains, the authors explicitly note that the generality of the approach is not established by the presented experiments, leaving open whether the same benefits hold more broadly.

References

The generality of the approach remains unclear from these experiments, as we have constructed the measures, the algorithms, and the domain to capture the primary facets of the phenomena of interest.

Towards model-free RL algorithms that scale well with unstructured data (2311.02215 - Modayil et al., 2023) in Section 7 (Discussion)