Scaling benefits of large in-context RL demonstrations

Determine whether providing large language models with thousands of state–action–reward trajectories as in-context examples improves their performance at agent mental modelling in reinforcement learning tasks compared to using only a limited number of examples.

Background

The paper evaluates whether LLMs can infer and explain reinforcement learning agents’ behavior and environment dynamics from state–action–reward histories using the LLM-Xavier framework. Experiments primarily use limited in-context histories to probe next-action and state-transition understanding across several classic control and robotics tasks.

In the limitations section, the authors explicitly note uncertainty about how performance scales with substantially larger in-context demonstration sets. They hypothesize that much larger prompts containing many trajectories, or domain-specific fine-tuning, might improve mental modeling capabilities, but this remains unresolved and is left for future analysis.

References

It remains unclear whether LLMs can benefit from thousands of agent trajectories compared to the limited number of examples studied in this paper.

— Mental Modeling of Reinforcement Learning Agents by Language Models (2406.18505 - Lu et al., 26 Jun 2024) in Limitations

Scaling benefits of large in-context RL demonstrations

Background

References

Related Problems