- The paper presents a system that uses a code-generation LLM to convert natural language commands into executable game behavior branches.
- The methodology integrates Unity, Photon PUN2, AWS, and the llama-v2-34b-code model to achieve real-time command processing and smooth multiplayer synchronization.
- The system logs every command and corresponding behavior branch in DynamoDB, enabling detailed performance evaluations and iterative improvements.
Commanding Game Agents Using Natural Language with Code-Generation LLMs
Introduction
The gaming world has seen incredible advancements over the years, but one area that's always posed challenges is enabling more natural interactions between players and game agents. Imagine games where you can simply type what you want your character to do, and they execute your commands flawlessly. This paper presents a novel approach that pushes the boundaries of interactivity by using a code-generation LLM to translate free-form text commands into game actions.
System Overview
Components of the System
The system is made up of several key components that work together to provide a seamless interactive experience:
- Unity (2022.3.15f1): This tool handles the game environment and graphical interface for players.
- Photon PUN2 (2.45): It's responsible for real-time network synchronization between the players' game instances, ensuring a smooth multiplayer experience.
- AWS Server: This backend powerhouse manages player authentication (using Cognito), logs all commands and actions (storing them in DynamoDB), and interfaces with the LLM API to generate the behavior branches from player commands.
- Fireworks AI API: The 'llama-v2-34b-code' model is employed here for its rapid response time, crucial for maintaining an engaging gameplay experience.
Game Environment
In the game, players control agents in a 3D space. These game agents can perform actions like:
- Thunderbolt: A ranged attack where the agent shoots a energy sphere at the opponent.
- Iron Tail: A melee attack involving a powerful tail swing.
- Tackle: A rushing movement hitting the opponent directly.
Players input commands through a straightforward text interface, and the game pauses momentarily to process these inputs. This ensures that each command is translated and executed accurately.
Command-Action Translation
The translation of text commands into game actions is the heart of this system. The process involves converting player inputs into what's called "behavior branches," which are tree structures chaining conditions and actions. These branches have:
- Action Nodes: Specify the action to be executed by the game agent.
- Condition Nodes: Direct the flow based on whether the specified conditions are met.
- Control Nodes: Manage the execution flow of actions.
This method leverages the structural approach found in programming, allowing for more dynamic and varied behaviors compared to traditional hard-coded algorithms.
Logs and Evaluation
To assess the system's performance, all commands and their subsequent translations are logged in DynamoDB. The logs capture pertinent details like:
- Session ID
- Timestamp
- Player's ID
- Original Command
- Translated Behavior Branch
These logs are invaluable for both debugging and iterative improvement, providing detailed insights into how commands are translated and executed.
Demonstration
For practical validation, a live demonstration involves two players engaging in a battle using this system. The rules and commands are explained, and players type commands to control their agents, aiming to defeat their opponent's game agent. This interactive demo helps showcase the flexibility and responsiveness of the system in real-world scenarios.
Conclusions and Future Work
In conclusion, this paper demonstrates the feasibility of using a code-generation LLM to revolutionize player-agent interaction in games. By translating free-form text commands into sophisticated actions, this system offers a glimpse into the future of gaming where natural language could become the primary mode of interaction.
Future work will involve more comprehensive quantitative and qualitative analyses to fine-tune the system further. Enhancements may include reducing latency, expanding the range of possible commands, and improving the natural language understanding capabilities to make this technology even more practical for wider industry adoption.