2000 character limit reached
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models (2401.05618v3)
Published 11 Jan 2024 in cs.CL and cs.AI
Abstract: In this paper, we introduce Concise Chain-of-Thought (CCoT) prompting. We compared standard CoT and CCoT prompts to see how conciseness impacts response length and correct-answer accuracy. We evaluated this using GPT-3.5 and GPT-4 with a multiple-choice question-and-answer (MCQA) benchmark. CCoT reduced average response length by 48.70% for both GPT-3.5 and GPT-4 while having a negligible impact on problem-solving performance. However, on math problems, GPT-3.5 with CCoT incurs a performance penalty of 27.69%. Overall, CCoT leads to an average per-token cost reduction of 22.67%. All code, data, and supplemental materials are available on GitHub at https://github.com/matthewrenze/jhu-concise-cot
- Anyscale. Pricing, 2023. Available at: https://docs.endpoints.anyscale.com/pricing/, Accessed: 2023-12-07.
- Language models are few-shot learners. In H Larochelle, M Ranzato, R Hadsell, M F Balcan, and H Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. ArXiv, 3 2018.
- Agent instructs large language models to be general zero-shot reasoners. arXiv, 10 2023.
- Waleed Kadous. Numbers every llm developer should know, 5 2023. Available at: https://www.anyscale.com/blog/num-every-llm-developer-should-know, Accessed: 2023-12-07.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199–22213, 5 2022.
- Logiqa: A challenge dataset for machine reading comprehension with logical reasoning. In International Joint Conference on Artificial Intelligence, 2020.
- Text and patterns: For effective chain of thought, it takes two to tango. arXiv, 9 2022.
- Text and patterns: For effective chain of thought it takes two to tango, 2022. Available at: https://openreview.net/forum?id=z9fXRC5XdT, Accessed: 2023-12-07.
- On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18:50–60, 3 1947.
- Augmented language models: a survey. arXiv, 2 2023.
- Microsoft. Azure openai service, 2023. Available at: https://azure.microsoft.com/en-us/products/ai-services/openai-service/, Accessed: 2023-12-07.
- OpenAI. Introducing chatgpt, 11 2022. Available at: https://openai.com/blog/chatgpt, Accessed: 2023-04-29.
- OpenAI. Gpt-4, 3 2023. Available at: https://openai.com/research/gpt-4, Accessed: 2023-04-29.
- OpenAI. Gpt-4 technical report. arXiv, 3 2023. Accessed: 2023-04-29.
- OpenAI. Models, 2023. Available at: https://openai.com/product#models, Accessed: 2023-12-07.
- OpenAI. Openai - api reference, 2023. Available at: https://platform.openai.com/docs/api-reference/chat/create, Accessed: 2023-11-26.
- OpenAI. Pricing, 2023. Available at: https://openai.com/pricing, Accessed: 2023-12-07.
- OpenAI. Tokens, 2023. Available at: https://platform.openai.com/docs/introduction/tokens, Accessed: 2023-12-07.
- Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Proceedings of the Conference on Health, Inference, and Learning, pages 248–260. PMLR, 2022.
- The SciPy Community. Scipy v1.11.4 manual - scipy.stats.mannwhitneyu, 2023. Available at: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html, Accessed: 2023-12-07.
- From lsat: The progress and challenges of complex reasoning. IEEE/ACM Transactions on Audio, Speech and Language Processing, 30:2201–2216, 8 2021.
- Chain-of-thought prompting elicits reasoning in large language models. arXiv, 1 2022.
- A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv, 2 2023.
- Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.
- Agieval: A human-centric benchmark for evaluating foundation models. ArXiv, 4 2023.
- Large language models are human-level prompt engineers. The Eleventh International Conference on Learning Representations, 11 2023.