Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Empirical Complexity of Reasoning and Planning in LLMs (2404.11041v2)

Published 17 Apr 2024 in cs.AI and cs.LG

Abstract: Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with LLMs, but why? This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning. We experimented with 6 reasoning tasks, ranging from grade school math, air travel planning, ..., to Blocksworld. The results suggest that (i) both CoT and ToT benefit significantly from task decomposition, which breaks a complex reasoning task into a sequence of steps with low sample complexity and explicitly outlines the reasoning structure, and (ii) for computationally hard reasoning tasks, the more sophisticated tree structure of ToT outperforms the linear structure of CoT. These findings provide useful guidelines for the use of LLM in solving reasoning tasks in practice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
  3. Learning and generalization in overparameterized neural networks, going beyond two layers. Advances in neural information processing systems, 32.
  4. Np-hardness of euclidean sum-of-squares clustering. Machine learning, 75:245–248.
  5. Avrim Blum and Ronald Rivest. 1988. Training a 3-node neural network is np-complete. Advances in neural information processing systems, 1.
  6. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588.
  7. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  8. Everything of thoughts: Defying the law of penrose triangle for thought generation. arXiv preprint arXiv:2311.04254.
  9. William F Dowling and Jean H Gallier. 1984. Linear-time algorithms for testing the satisfiability of propositional horn formulae. The Journal of Logic Programming, 1(3):267–284.
  10. Faith and fate: Limits of transformers on compositionality. arXiv preprint arXiv:2305.18654.
  11. Alphazero-like tree-search can guide large language model decoding and training. arXiv preprint arXiv:2309.17179.
  12. Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR.
  13. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992.
  14. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  15. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR.
  16. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608.
  17. A survey of np-complete puzzles. ICGA Journal, 31(1):13–34.
  18. Eugene Kharitonov and Rahma Chaabouni. 2020. What they do when in doubt: a study of inductive biases in seq2seq learners. In International Conference on Learning Representations.
  19. Jon Kleinberg and Eva Tardos. 2005. Algorithm Design. Addison-Wesley Longman Publishing Co., Inc., USA.
  20. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  21. Making ppo even better: Value-guided monte-carlo tree search decoding. arXiv preprint arXiv:2309.15028.
  22. Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.
  23. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530. IEEE.
  24. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2998–3009.
  25. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  26. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
  27. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  28. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  29. Thinking like transformers. In International Conference on Machine Learning, pages 11080–11090. PMLR.
  30. Self-evaluation guided beam search for reasoning. In Thirty-seventh Conference on Neural Information Processing Systems.
  31. What can neural networks reason about? In International Conference on Learning Representations.
  32. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  33. Planning with large language models for code generation. arXiv preprint arXiv:2303.05510.
  34. Large language models as commonsense knowledge for large-scale task planning. arXiv preprint arXiv:2305.14078.
  35. Understanding length generalization by thinking like transformers. In The Twelfth International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Liwei Kang (2 papers)
  2. Zirui Zhao (18 papers)
  3. David Hsu (73 papers)
  4. Wee Sun Lee (60 papers)
Citations (3)
Youtube Logo Streamline Icon: https://streamlinehq.com