Self-playing Adversarial Language Game Enhances LLM Reasoning (2404.10642v3)

Published 16 Apr 2024 in cs.CL and cs.LG

Abstract: We explore the potential of self-play training for LLMs in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's utterances. To win the game, both players must have sufficient knowledge about the target word and high-level reasoning ability to infer and express in this information-reserved conversation. Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by Self-Playing this Adversarial language Game (SPAG). With this goal, we select several open-source LLMs and let each act as the attacker and play with a copy of itself as the defender on an extensive range of target words. Through reinforcement learning on the game outcomes, we observe that the LLMs' performances uniformly improve on a broad range of reasoning benchmarks. Furthermore, iteratively adopting this self-play process can continuously promote LLMs' reasoning abilities. The code is available at https://github.com/Linear95/SPAG.

References (64)

Citations (10)

View on Semantic Scholar

Summary

The paper introduces SPAG, a self-play adversarial language game that enhances LLM reasoning without relying on human-generated data.
It combines imitation learning with reinforcement learning from self-play episodes to iteratively improve performance across multiple reasoning benchmarks.
Experiments with models like LLaMA-2-7B and Baichuan-2-13B demonstrate consistent gains in reasoning and strategic gameplay against GPT-4.

Enhancing Reasoning in LLMs through Adversarial Language Game Self-Play

Introduction to SPAG

Recent advancements in LLMs such as GPT-4 and LLaMA have driven remarkable progress in AI's natural language understanding and generation capabilities. Despite their success, the enhancement of LLMs' reasoning abilities remains a significant challenge. This paper introduces a novel approach named Self-Play from Adversarial language Game (SPAG), aiming to improve LLMs' reasoning capabilities without human data, by engaging them in self-play within an adversarial language game known as Adversarial Taboo.

Adversarial Taboo and Self-Play

In the Adversarial Taboo game, an "attacker" (also an LLM) and a "defender" (its counterpart) engage in a dialogue with the goal of inducing the defender to mention a target word known only to the attacker. This setup requires both participants to exhibit a deep understanding of the target word, as well as strategic reasoning capabilities to navigate the conversation effectively. The research leveraged this game's nature by applying self-play, where an LLM plays both roles interchangeably, learning from its performance through reinforcement learning techniques.

Reinforcement Learning (RL) Approach

The SPAG methodology involves initially preparing LLMs to follow the game's rules through imitation learning based on dialogues generated by other advanced models like GPT-4. Following this, it engages the models in self-play iterations, where they accumulate experience by playing numerous game episodes against themselves. The outcomes of these episodes guide the reinforcement learning process, aiming to iteratively refine and elevate the LLMs' reasoning abilities. Importantly, the paper adapts offline reinforcement learning methods to overcome inefficiencies associated with online learning in the context of natural text generation.

Empirical Validation

Two open-source pre-trained models, LLaMA-2-7B and Baichuan-2-13B, were used to test the SPAG approach. The evaluation was conducted across several reasoning benchmarks, including BIG-Bench Hard, ARC Easy and Challenge, Mutual, WinoGrande, LogiQA2, PIQA, and a comprehensive language understanding metric, MMLU. The results were promising, showing continuous improvement in reasoning performance across multiple benchmarks as the models underwent successive self-play epochs. Additionally, when the models trained through SPAG were pitted against GPT-4 in the Adversarial Taboo game, their win rates improved consistently, indicating enhanced gameplay strategy and, by extension, reasoning abilities.

Discussion and Future Directions

The findings suggest that engaging LLMs in strategic adversarial language games through self-play offers a viable path to enhancing their reasoning skills, beyond the capabilities acquired through conventional training methods. This approach, inspired by the success of self-play in developing strategic game-playing AIs like AlphaGO, underscores the potential of adversarial language games in advancing AI reasoning without relying on human-generated training data.

Given the significant performance gains observed, future work could explore the application of SPAG to a broader range of LLM architectures and reasoning tasks. Additionally, further refinement of the self-play and reinforcement learning methodologies could unlock even greater advancements in LLM reasoning capabilities. This paper represents a compelling step toward realizing more sophisticated, strategic, and reasoning-capable LLMs.

Concluding Remarks

In summary, SPAG emerges as a potent methodology for enhancing the reasoning abilities of LLMs, advocating for the utility of adversarial language games and self-play in AI development. The approach's success across various benchmarks and its direct impact on game-playing strategies emphasize its potential as a fundamental tool in the future development of AI reasoning skills.