Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models (2305.14318v3)

Published 23 May 2023 in cs.CL

Abstract: LLMs have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disentangles abstract tool creation and concrete decision execution, resulting in improved performance. We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems and diverse tabular contents. Remarkably, CREATOR outperforms existing chain-of-thought, program-of-thought, and tool-using baselines. Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability. Further research demonstrates that leveraging LLMs as tool creators facilitates knowledge transfer, and LLMs exhibit varying levels of tool creation abilities, enabling them to adapt to diverse situations. The tool creation ability revolutionizes the LLM's problem-solving paradigm, driving us closer to the next frontier of artificial intelligence. All the codes and data are released.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  2. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  3. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588.
  4. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  5. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  6. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  7. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155.
  8. Pal: Program-aided language models. arXiv preprint arXiv:2211.10435.
  9. Tanmay Gupta and Aniruddha Kembhavi. 2022. Visual programming: Compositional visual reasoning without training. arXiv preprint arXiv:2211.11559.
  10. Measuring mathematical problem solving with the math dataset. Sort, 2(4):0–6.
  11. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874.
  12. Internet-augmented dialogue generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8460–8478.
  13. Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. arXiv preprint arXiv:2303.16434.
  14. Visual instruction tuning. arXiv preprint arXiv:2304.08485.
  15. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688.
  16. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
  17. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. arXiv preprint arXiv:2209.14610.
  18. A survey of deep learning for mathematical reasoning. arXiv preprint arXiv:2212.10535.
  19. Augmented language models: a survey.
  20. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
  21. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114.
  22. OpenAI. 2022. Chatgpt.
  23. OpenAI. 2023. Gpt-4 technical report.
  24. Are nlp models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080–2094.
  25. Tool learning with foundation models.
  26. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  27. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580.
  28. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. arXiv preprint arXiv:2208.03188.
  29. Vipergpt: Visual inference via python execution for reasoning. arXiv preprint arXiv:2303.08128.
  30. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  31. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509.
  32. Code4structure: Code generation for few-shot structure prediction from natural language. In arxiv.
  33. Learning to generate from textual interactions. In arxiv.
  34. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109.
  35. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  36. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671.
  37. Pengfei Yu and Heng Ji. 2023. Self information update for large language models through mitigating exposure bias. In arxiv.
Citations (20)

Summary

We haven't generated a summary for this paper yet.