Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models are Contrastive Reasoners (2403.08211v2)

Published 13 Mar 2024 in cs.CL and cs.AI
Large Language Models are Contrastive Reasoners

Abstract: Prompting methods play a crucial role in enhancing the capabilities of pre-trained LLMs. We explore how contrastive prompting (CP) significantly improves the ability of LLMs to perform complex reasoning. We demonstrate that LLMs are decent contrastive reasoners by simply adding "Let's give a correct and a wrong answer." before LLMs provide answers. Experiments on various LLMs show that zero-shot contrastive prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks without any hand-crafted few-shot examples, such as increasing the accuracy on GSM8K from 35.9% to 88.8% and AQUA-RAT from 41.3% to 62.2% with the state-of-the-art GPT-4 model. Our method not only surpasses zero-shot CoT and few-shot CoT in most arithmetic and commonsense reasoning tasks but also can seamlessly integrate with existing prompting methods, resulting in improved or comparable results when compared to state-of-the-art methods. Our code is available at https://github.com/yao8839836/cp

Enhancing LLMs' Reasoning Capabilities through Contrastive Prompting

Introduction to Contrastive Prompting

Recent developments in the domain of LLMs have showcased their potential to tackle a wide array of complex reasoning tasks. However, the quest to further refine their reasoning and problem-solving abilities continues. This exploration introduces a novel prompting strategy, termed as "Contrastive Prompting" (CP), aimed at significantly enhancing the reasoning capabilities of LLMs, such as GPT-4, across a spectrum of tasks including arithmetic, commonsense, and symbolic reasoning. By integrating a directive to generate both correct and incorrect responses within their outputs, CP marks a notable advancement in prompting methodologies. This method has demonstrated substantial improvements in performance metrics, for instance, escalating the accuracy in tasks like GSM8K from 35.9% to 88.8% and in AQUA-RAT from 41.3% to 62.2% using GPT-4, without the need for manual few-shot examples.

Addressing Challenges in Current Prompting Paradigms

The emergence of CP is situated against the backdrop of existing prompting techniques, specifically Chain-of-Thought (CoT) prompting, which has shown promise but also faces limitations such as generating inaccurate reasoning steps or needing labor-intensive manual labeling for diverse tasks. CP sidesteps these hurdles by autonomously guiding LLMs to generate both correct and incorrect outcomes, thus enriching their self-evaluation capabilities and fostering a deeper understanding of the tasks at hand.

Methodology: Implementing Contrastive Prompting

The CP technique is structured around a two-stage prompting process: firstly, prompting the model to articulate a reasoning process that culminates in both a correct and an incorrect answer, and subsequently, extracting the correct answer from this generated response. Such a framework not only negates the necessity for pre-labeled examples but also inherently encourages models to discern and analyze potential errors within their reasoning processes.

Experimental Validation and Insights

The robust evaluation of CP across twelve datasets involving arithmetic, commonsense, symbolic, and other logical reasoning tasks underpins its efficacy. Notably, CP outperforms existing zero-shot and few-shot CoT methods in most instances and exhibits compatibility with state-of-the-art techniques, suggesting its potential as a universal enhancement to current LLM prompting strategies.

Comparative Analysis with Current Methodologies

When compared with a range of baseline methods, including Few-shot-CoT and several state-of-the-art prompting strategies, CP demonstrates superior or comparable performance. Its integration with Few-shot-CoT particularly shines, achieving new benchmarks on datasets like GSM8K, AQUA-RAT, and SVAMP with GPT-4. This comparative analysis solidifies CP's position as a highly effective approach for improving LLMs' reasoning capabilities.

Theoretical Implications and Future Trajectories

The CP method's success can be attributed to its alignment with the intrinsic learning mechanisms of LLMs, potentially tapping into the patterns formed during their extensive pre-training on diverse textual data. Looking ahead, there is ample scope for investigating CP's application across various model sizes, mitigating potential biases in generated content, and exploring synergies with other advanced prompting techniques. Additionally, an in-depth analysis of CP's impact on the internal parameters of LLMs could offer further insights into the underpinnings of its effectiveness.

Conclusion

The introduction of Contrastive Prompting heralds a significant step forward in the refinement of LLMs for complex reasoning tasks. By enabling models to generate and evaluate both correct and incorrect answers, CP not only enhances their accuracy across a broad spectrum of challenges but also opens new avenues for research into more efficient and effective prompting techniques. The practical and theoretical implications of this methodology pave the way for future breakthroughs in the ever-evolving landscape of generative AI and LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. X. Amatriain. Prompt design and engineering: Introduction and advanced methods. arXiv preprint arXiv:2401.14423, 2024.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
  4. Contrastive chain-of-thought prompting. arXiv preprint arXiv:2311.09277, 2023.
  5. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
  6. Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR, 2023.
  7. A survey on contrastive self-supervised learning. Technologies, 9(1):2, 2020.
  8. Language models can solve computer tasks. Advances in Neural Information Processing Systems, 36, 2023.
  9. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  10. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  11. Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36, 2023.
  12. OpenAI. Gpt-4 technical report, 2023.
  13. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  14. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2023.
  15. H. L. Roediger and B. Finn. Getting it wrong: Surprising tips on how to learn. Scientific American, pages 499–504, 2009.
  16. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927, 2024.
  17. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2023.
  18. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  19. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, 2023.
  20. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022a.
  21. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022b.
  22. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36, 2023a.
  23. Beyond chain-of-thought, effective graph-of-thought reasoning in large language models. arXiv preprint arXiv:2305.16582, 2023b.
  24. Large language models as analogical reasoners. arXiv preprint arXiv:2310.01714, 2023.
  25. In-context principle learning from mistakes. arXiv preprint arXiv:2402.05403, 2024.
  26. Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations (ICLR 2023), 2023.
  27. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Liang Yao (29 papers)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub