Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Arithmetic Reasoning with LLM: Prolog Generation & Permutation (2405.17893v1)

Published 28 May 2024 in cs.CL and cs.AI

Abstract: Instructing LLMs to solve elementary school math problems has shown great success using Chain of Thought (CoT). However, the CoT approach relies on an LLM to generate a sequence of arithmetic calculations which can be prone to cascaded calculation errors. We hypothesize that an LLM should focus on extracting predicates and generating symbolic formulas from the math problem description so that the underlying calculation can be done via an external code interpreter. We investigate using LLM to generate Prolog programs to solve mathematical questions. Experimental results show that our Prolog-based arithmetic problem-solving outperforms CoT generation in the GSM8K benchmark across three distinct LLMs. In addition, given the insensitive ordering of predicates and symbolic formulas in Prolog, we propose to permute the ground truth predicates for more robust LLM training via data augmentation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Falcon-40B: an open large language model with state-of-the-art performance.
  2. Neural module networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 39–48.
  3. Graph of thoughts: Solving elaborate problems with large language models.
  4. Ivan Bratko. 2012. Prolog programming for Artificial Intelligence. Addison-Wesley.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks.
  7. Palm: Scaling language modeling with pathways.
  8. W. F. Clocksin and C. S. Mellish. 2003. Programming in Prolog. Springer-Verlag.
  9. Training verifiers to solve math word problems.
  10. Michael A. Covington. 2002. Natural language processing for Prolog programmers. Prentice Hall.
  11. Pal: Program-aided language models.
  12. Tora: A tool-integrated reasoning agent for mathematical problem solving.
  13. Neural module networks for reasoning over text. In International Conference on Learning Representations.
  14. Lora: Low-rank adaptation of large language models.
  15. Large language models can self-improve.
  16. Drew Hudson and Christopher D Manning. 2019. Learning by abstraction: The neural state machine. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  17. Mistral 7b.
  18. Encouraging divergent thinking in large language models through multi-agent debate.
  19. Jieyi Long. 2023. Large language model guided tree-of-thought.
  20. Faithful chain-of-thought reasoning.
  21. Lila: A unified benchmark for mathematical reasoning.
  22. Learning a natural language interface with neural programmer. In International Conference on Learning Representations.
  23. Improving coherence and consistency in neural sequence models with dual-system, neuro-symbolic reasoning.
  24. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning.
  25. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
  26. Code llama: Open foundation models for code.
  27. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  28. Synthetic prompting: Generating chain-of-thought demonstrations for large language models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 30706–30775. PMLR.
  29. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  30. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
  31. Llama 2: Open foundation and fine-tuned chat models.
  32. Emergent abilities of large language models.
  33. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  34. Neuro-symbolic integration brings causal and reliable reasoning proofs.
  35. Xlnet: Generalized autoregressive pretraining for language understanding.
  36. Tree of thoughts: Deliberate problem solving with large language models.
  37. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825.
  38. Cumulative reasoning with large language models.
  39. Least-to-most prompting enables complex reasoning in large language models.
  40. Solving math word problems via cooperative reasoning induced language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Xiaocheng Yang (11 papers)
  2. Bingsen Chen (2 papers)
  3. Yik-Cheung Tam (8 papers)
Citations (5)