Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 109 tok/s
Gemini 3.0 Pro 52 tok/s Pro
Gemini 2.5 Flash 159 tok/s Pro
Kimi K2 203 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning (2309.10814v2)

Published 19 Sep 2023 in cs.CL

Abstract: How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning? We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a LLM to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output. Despite using a task-general prompt, we find that this approach can improve upon strong baselines across a range of different tasks including math and symbolic reasoning, text classification, question answering, and instruction following. We found that the generated programs are interpretable since they outline the exact reasoning process followed by the program interpreter.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  2. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
  3. Large language models as tool makers. arXiv preprint arXiv:2305.17126, 2023.
  4. Evaluating large language models trained on code. 2021.
  5. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
  6. Binding language models in symbolic languages. In The Eleventh International Conference on Learning Representations, 2022.
  7. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  8. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
  9. Entailment, intensionality and text understanding. In Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning, pp.  38–45, 2003. URL https://aclanthology.org/W03-0906.
  10. Hate speech dataset from a white supremacy forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pp.  11–20, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5102. URL https://aclanthology.org/W18-5102.
  11. Chain-of-thought hub: A continuous effort to measure large language models’ reasoning performance, 2023.
  12. Pal: Program-aided language models. In International Conference on Machine Learning, pp. 10764–10799. PMLR, 2023.
  13. Entailment as robust self-learner. arXiv preprint arXiv:2305.17197, 2023.
  14. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies, 2021a.
  15. Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies. Transactions of the Association for Computational Linguistics (TACL), 2021b.
  16. Code prompting: a neural symbolic method for complex reasoning in large language models, 2023.
  17. Large language models are zero-shot reasoners, 2023.
  18. TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs. arXiv:2303.16434, 2023a.
  19. Taskmatrix.ai: Completing tasks by connecting foundation models with millions of apis, 2023b.
  20. Truthfulqa: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  3214–3252, 2022.
  21. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  22. Faithful chain-of-thought reasoning, 2023.
  23. Augmented language models: a survey, 2023.
  24. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  188–197, 2019.
  25. Show your work: Scratchpads for intermediate computation with language models, 2021.
  26. Zero-shot relation classification as textual entailment. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pp.  72–78, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5511. URL https://aclanthology.org/W18-5511.
  27. OpenAI. Introducing chatgpt, November 2022. URL https://openai.com/blog/chatgpt.
  28. OpenAI. Gpt-4 technical report. 2023. URL https://cdn.openai.com/papers/gpt-4.pdf.
  29. ART: Automatic multi-step reasoning and tool-use for large language models. arXiv:2303.09014, 2023.
  30. Tool learning with foundation models, 2023.
  31. Code llama: Open foundation models for code, 2023.
  32. Social bias frames: Reasoning about social and power implications of language. arXiv preprint arXiv:1911.03891, 2019.
  33. CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  3687–3697, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1404. URL https://www.aclweb.org/anthology/D18-1404.
  34. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  35. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp.  1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL https://aclanthology.org/D13-1170.
  36. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2023.
  37. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022.
  38. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases, 2023.
  39. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  40. Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp.  353–355, 2018.
  41. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  42. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641, 2019. doi: 10.1162/tacl˙a˙00290. URL https://aclanthology.org/Q19-1040.
  43. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  44. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023a.
  45. Beyond chain-of-thought, effective graph-of-thought reasoning in large language models. arXiv preprint arXiv:2305.16582, 2023b.
  46. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022.
Citations (11)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com