Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models (2401.11725v2)
Abstract: Symbols (or more broadly, non-natural language textual representations) such as numerical sequences, molecular formulas, and table delimiters widely exist, playing important roles in various tasks such as abstract reasoning, chemical property prediction, and table question answering. Despite the impressive natural language comprehension capabilities of LLMs, their reasoning abilities for symbols remain inadequate, which could attributed to the difference between symbol representations and general natural languages. We propose symbol-to-language (S2L), a tuning-free method that enables LLMs to solve symbol-related problems with information expressed in natural language. Specifically, S2L first converts the symbols involved to language-based representations, which can be implemented by prompting LLMs or leveraging external tools, then these language-based representations are integrated into the original problem via direct substitution or concatenation, serving as useful input information for LLMs. We evaluate the S2L method using both API-based (GPT-4, ChatGPT) and open-source (OpenChat) models over eight symbol-related tasks, ranging from symbol-only abstract reasoning to sentiment analysis in social media. Experimental results show that S2L consistently leads to superior performance. For example, by employing S2L for GPT-4, there can be average significant improvements of +21.9% and +9.5% for subtasks in 1D-ARC and Dyck language, respectively. Codes and data are available at https://github.com/THUNLP-MT/symbol2language.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Wenhu Chen. Large language models are few(1)-shot table reasoners. In Findings of the Association for Computational Linguistics: EACL 2023, pp. 1120–1130, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics.
- Tabfact: A large-scale dataset for table-based fact verification. In International Conference on Learning Representations, 2020.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
- Unsupervised explanation generation via correct instantiations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 12700–12708, 2023.
- François Chollet. On the measure of intelligence. arXiv preprint arXiv:1911.01547, 2019.
- Rephrase and respond: Let large language models ask better questions for themselves. arXiv preprint arXiv:2311.04205, 2023.
- Large language models are neurosymbolic reasoners. arXiv preprint arXiv:2401.09334, 2024.
- Mol-instructions: A large-scale biomolecular instruction dataset for large language models. arXiv preprint arXiv:2306.08018, 2023.
- Large language models are not abstract reasoners. arXiv preprint arXiv:2305.19555, 2023.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- What indeed can gpt models do in chemistry? a comprehensive benchmark on eight tasks, 2023.
- Chain-of-symbol prompting elicits planning in large langauge models. arXiv preprint arXiv:2305.10276, 2023.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pp. 15696–15707. PMLR, 2023.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pp. 22199–22213, 2022.
- P-stance: A large dataset for stance detection in political domain. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2355–2365, Online, August 2021. Association for Computational Linguistics.
- Comparing humans, gpt-4, and gpt-4v on abstraction and reasoning tasks. arXiv preprint arXiv:2311.09247, 2023.
- The conceptarc benchmark: Evaluating understanding and generalization in the arc domain. arXiv preprint arXiv:2305.07141, 2023.
- MTEB: Massive text embedding benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2014–2037, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics.
- Setsuo Ohsuga. Bridging the gap between non-symbolic and symbolic processing–how could human being acquire language? Fundamenta Informaticae, 75(1-4):385–406, 2007.
- OpenAI. 2022. URL https://openai.com/chatgpt.
- OpenAI. GPT-4V(ision) system card. 2023a. URL https://openai.com/research/gpt-4v-system-card.
- OpenAI. GPT-4 technical report. 2023b. URL https://cdn.openai.com/papers/gpt-4.pdf.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1470–1480, Beijing, China, July 2015. Association for Computational Linguistics.
- Phenomenal yet puzzling: Testing inductive reasoning capabilities of language models with hypothesis refinement. arXiv preprint arXiv:2310.08559, 2023.
- Stout: Smiles to iupac names using neural machine translation. Journal of Cheminformatics, 13(1):1–14, 2021.
- Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence, 4(12):1256–1264, 2022.
- Abu Awal Md Shoeb and Gerard de Melo. EmoTag1200: Understanding the association between emojis and emotions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8957–8967, Online, November 2020. Association for Computational Linguistics.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
- Large language models are in-context semantic reasoners rather than symbolic reasoners. arXiv preprint arXiv:2305.14825, 2023a.
- Explain-then-translate: An analysis on improving program translation with self-generated explanations. arXiv preprint arXiv:2311.07070, 2023b.
- Solving olympiad geometry without human demonstrations. Nature, 2024. doi: 10.1038/s41586-023-06747-5.
- Openchat: Advancing open-source language models with mixed-quality data. arXiv preprint arXiv:2309.11235, 2023a.
- Hypothesis search: Inductive reasoning with language models. arXiv preprint arXiv:2309.05660, 2023b.
- Rationale-augmented ensembles in language models. arXiv preprint arXiv:2207.00747, 2022.
- Meta-reasoning: Semantics-symbol deconstruction for large language models. arXiv preprint arXiv:2306.17820, 2023c.
- Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9):1526–1541, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pp. 24824–24837, 2022.
- Symbol tuning improves in-context learning in language models. arXiv preprint arXiv:2305.08298, 2023.
- Symbol-llm: Towards foundational symbol-centric interface for large language models. arXiv preprint arXiv:2311.09278, 2023a.
- Graphs, constraints, and search for the abstraction and reasoning corpus. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4):4115–4122, Jun. 2023b.
- Llms and the abstraction and reasoning corpus: Successes, failures, and the importance of object-based representations. arXiv preprint arXiv:2305.18354, 2023c.
- Harnessing the power of large language models for natural language to first-order logic translation. arXiv preprint arXiv:2305.15541, 2023.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Investigating chain-of-thought with chatgpt for stance detection on social media. arXiv preprint arXiv:2304.03087, 2023a.
- Cumulative reasoning with large language models. arXiv preprint arXiv:2308.04371, 2023b.
- Yile Wang (24 papers)
- Sijie Cheng (23 papers)
- Zixin Sun (2 papers)
- Peng Li (390 papers)
- Yang Liu (2253 papers)