Eliciting Better Multilingual Structured Reasoning from LLMs through Code (2403.02567v2)
Abstract: The development of LLMs (LLM) has shown progress on reasoning, though studies have largely considered either English or simple reasoning tasks. To address this, we introduce a multilingual structured reasoning and explanation dataset, termed xSTREET, that covers four tasks across six languages. xSTREET exposes a gap in base LLM performance between English and non-English reasoning tasks. We then propose two methods to remedy this gap, building on the insight that LLMs trained on code are better reasoners. First, at training time, we augment a code dataset with multilingual comments using machine translation while keeping program code as-is. Second, at inference time, we bridge the gap between training and inference by employing a prompt structure that incorporates step-by-step code primitives to derive new facts and find a solution. Our methods show improved multilingual performance on xSTREET, most notably on the scientific commonsense reasoning subtask. Furthermore, the models show no regression on non-reasoning tasks, thus demonstrating our techniques maintain general-purpose abilities.
- Mega: Multilingual evaluation of generative ai. arXiv preprint arXiv:2303.12528.
- The falcon series of open language models.
- Codekgc: Code language model for generative knowledge graph construction. arXiv preprint arXiv:2304.09048.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning. In The Eleventh International Conference on Learning Representations.
- Corrpus: Code-based structured prompting for neurosymbolic story understanding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13152–13168.
- How good are gpt models at machine translation? a comprehensive evaluation. ArXiv, abs/2302.09210.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations.
- The stack: 3 tb of permissively licensed source code. arXiv preprint arXiv:2211.15533.
- This land is Your, My land: Evaluating geopolitical biases in language models.
- Holistic evaluation of language models. Annals of the New York Academy of Sciences, 1525:140 – 146.
- Language models of code are few-shot commonsense learners. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1384–1403, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
- Learning math reasoning from self-sampled correct and partially-correct solutions. In International Conference on Learning Representations.
- Bidirectional language models are also few-shot learners. In The Eleventh International Conference on Learning Representations.
- Street: A multi-task structured reasoning and explanation benchmark. In The Eleventh International Conference on Learning Representations.
- Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations.
- Challenging big-bench tasks and whether chain-of-thought can solve them. In Annual Meeting of the Association for Computational Linguistics.
- Emergent abilities of large language models. Transactions on Machine Learning Research.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. arXiv preprint arXiv:2308.01320.
- Causal reasoning of entities and events in procedural texts. In Findings of the Association for Computational Linguistics: EACL 2023, pages 415–431, Dubrovnik, Croatia. Association for Computational Linguistics.
- Bryan Li (17 papers)
- Tamer Alkhouli (7 papers)
- Daniele Bonadiman (10 papers)
- Nikolaos Pappas (188 papers)
- Saab Mansour (32 papers)