Dice Question Streamline Icon: https://streamlinehq.com

Mitigating large language model inference bottlenecks in RL-based Hanabi agents

Determine effective methods to alleviate the inference-time bottleneck caused by integrating large language models into reinforcement learning agents that interact frequently with the environment, specifically when using language models to encode observations and actions in the R3D2 agent for Hanabi, while maintaining or improving gameplay performance and sample efficiency.

Information Square Streamline Icon: https://streamlinehq.com

Background

R3D2 integrates a LLM to encode textual observations and actions for Hanabi, enabling dynamic action and state spaces that support generalization across player counts. The authors chose a two-layer TinyBERT due to its balance between performance and inference time, which is critical in reinforcement learning loops with high interaction frequency.

They note that larger text encoders could, in principle, be integrated to potentially improve representation quality. However, the inference cost of LLMs can become a significant bottleneck for training and evaluation throughput in RL, and this limitation hinders scaling to more capable encoders within the R3D2 framework.

References

It is worth noting that larger text encoders could also be integrated; however, the inference cost of LLMs can become a bottleneck~\citep{kaplan2020scalinglawsneurallanguage}, which remains an open area of research.

A Generalist Hanabi Agent (2503.14555 - Sudhakar et al., 17 Mar 2025) in Section 4.2 (R3D2 Agent: Handling dynamic state and action space)