Self-playing Adversarial Language Game Enhances LLM Reasoning (2404.10642v3)
Abstract: We explore the potential of self-play training for LLMs in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's utterances. To win the game, both players must have sufficient knowledge about the target word and high-level reasoning ability to infer and express in this information-reserved conversation. Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by Self-Playing this Adversarial language Game (SPAG). With this goal, we select several open-source LLMs and let each act as the attacker and play with a copy of itself as the defender on an extensive range of target words. Through reinforcement learning on the game outcomes, we observe that the LLMs' performances uniformly improve on a broad range of reasoning benchmarks. Furthermore, iteratively adopting this self-play process can continuously promote LLMs' reasoning abilities. The code is available at https://github.com/Linear95/SPAG.
- Lmrl gym: Benchmarks for multi-turn reinforcement learning with language models. arXiv preprint arXiv:2311.18232, 2023.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Llemma: An open language model for mathematics. In The Twelfth International Conference on Learning Representations, 2023.
- Leftover-lunch: Advantage-based offline reinforcement learning for language models. In International Conference on Learning Representations, 2024.
- Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28, 2015.
- S. Bird. Nltk: the natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 69–72, 2006.
- Piqa: Reasoning about physical commonsense in natural language. In Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. arXiv preprint arXiv:2312.09390, 2023.
- Self-play fine-tuning converts weak language models to strong language models. arXiv preprint arXiv:2401.01335, 2024.
- Everyone deserves a reward: Learning customized human preferences. arXiv preprint arXiv:2309.03126, 2023a.
- Adversarial preference optimization. arXiv preprint arXiv:2311.08045, 2023b.
- A survey of chain of thought reasoning: Advances, frontiers and future. arXiv preprint arXiv:2309.15402, 2023.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. ArXiv, abs/1803.05457, 2018.
- Mutual: A dataset for multi-turn dialogue reasoning. In Proceedings of the 58th Conference of the Association for Computational Linguistics. Association for Computational Linguistics, 2020.
- M. Davies. COCA: Corpus of contemporary american english, 2020. URL https://www.english-corpora.org/coca/.
- Everything of thoughts: Defying the law of penrose triangle for thought generation. arXiv preprint arXiv:2311.04254, 2023.
- How abilities in large language models are affected by supervised fine-tuning data composition. arXiv preprint arXiv:2310.05492, 2023a.
- Raft: Reward ranked finetuning for generative foundation model alignment. arXiv preprint arXiv:2304.06767, 2023b.
- A framework for few-shot language model evaluation, 12 2023. URL https://zenodo.org/records/10256836.
- Reinforced self-training (rest) for language modeling. arXiv preprint arXiv:2308.08998, 2023.
- Interactive fiction games: A colossal adventure. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7903–7910, 2020.
- Measuring massive multitask language understanding. CoRR, abs/2009.03300, 2020. URL https://arxiv.org/abs/2009.03300.
- Large language models can self-improve. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
- Is chatgpt a good translator? a preliminary study. arXiv preprint arXiv:2301.08745, 2023.
- Chatgpt: Jack of all trades, master of none. Information Fusion, 99:101861, 2023.
- S. Kullback. Information theory and statistics. Courier Corporation, 1997.
- Deal or no deal? end-to-end learning of negotiation dialogues. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2443–2453, 2017.
- M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994.
- Logiqa 2.0 — an improved dataset for logical reasoning in natural language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, pages 1–16, 2023. doi: 10.1109/TASLP.2023.3293046.
- Red teaming game: A game-theoretic framework for red teaming language models. arXiv e-prints, pages arXiv–2310, 2023.
- R. M. Neal. Annealed importance sampling. Statistics and computing, 11:125–139, 2001.
- OpenAI. ChatGPT, Mar 14 version. https://chat.openai.com/chat, 2023a.
- OpenAI. GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023b.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
- Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023.
- Is reinforcement learning (not) for natural language processing: Benchmarks, baselines, and building blocks for natural language policy optimization. In International Conference on Learning Representations, 2023.
- Winogrande: An adversarial winograd schema challenge at scale. arXiv preprint arXiv:1907.10641, 2019.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations, 2016.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- Beyond human data: Scaling self-training for problem-solving with language models. arXiv preprint arXiv:2312.06585, 2023.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2022.
- Use chat gpt to solve programming bugs. International Journal of Information Technology & Computer Engineering (IJITC) ISSN: 2455-5290, 3(01):17–22, 2023.
- Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Is chatgpt the ultimate programming assistant–how far is it? arXiv preprint arXiv:2304.11938, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Language models don’t always say what they think: unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems, 36, 2024.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
- Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330, 2024.
- Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658, 2023.
- Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305, 2023a.
- Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data, 2023b.
- Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36, 2024.
- Adversarial language games for advanced natural language intelligence. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14248–14256, 2021.
- Self-rewarding language models. arXiv preprint arXiv:2401.10020, 2024.
- Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302, 2023.
- How language model hallucinations can snowball. arXiv preprint arXiv:2305.13534, 2023.