Scaling benefits of large in-context RL demonstrations
Determine whether providing large language models with thousands of state–action–reward trajectories as in-context examples improves their performance at agent mental modelling in reinforcement learning tasks compared to using only a limited number of examples.
References
It remains unclear whether LLMs can benefit from thousands of agent trajectories compared to the limited number of examples studied in this paper.
— Mental Modeling of Reinforcement Learning Agents by Language Models
(2406.18505 - Lu et al., 26 Jun 2024) in Limitations