Mitigating large language model inference bottlenecks in RL-based Hanabi agents
Determine effective methods to alleviate the inference-time bottleneck caused by integrating large language models into reinforcement learning agents that interact frequently with the environment, specifically when using language models to encode observations and actions in the R3D2 agent for Hanabi, while maintaining or improving gameplay performance and sample efficiency.
References
It is worth noting that larger text encoders could also be integrated; however, the inference cost of LLMs can become a bottleneck~\citep{kaplan2020scalinglawsneurallanguage}, which remains an open area of research.
— A Generalist Hanabi Agent
(2503.14555 - Sudhakar et al., 17 Mar 2025) in Section 4.2 (R3D2 Agent: Handling dynamic state and action space)