Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation (2402.14874v2)
Abstract: We propose a straightforward approach called Distillation Contrastive Decoding (DCD) to enhance the reasoning capabilities of LLMs during inference. In contrast to previous approaches that relied on smaller amateur models or analysis of hidden state differences, DCD employs Contrastive Chain-of-thought Prompting and advanced distillation techniques, including Dropout and Quantization. This approach effectively addresses the limitations of Contrastive Decoding (CD), which typically requires both an expert and an amateur model, thus increasing computational resource demands. By integrating contrastive prompts with distillation, DCD obviates the need for an amateur model and reduces memory usage. Our evaluations demonstrate that DCD significantly enhances LLM performance across a range of reasoning benchmarks, surpassing both CD and existing methods in the GSM8K and StrategyQA datasets.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread. Https://transformer-circuits.pub/2023/monosemantic-features/index.html.
- Language models are few-shot learners.
- Language models are few-shot learners. CoRR, abs/2005.14165.
- Contrastive chain-of-thought prompting.
- Dola: Decoding by contrasting layers improves factuality in large language models.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Deepseek llm: Scaling open-source language models with longtermism.
- Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies. Transactions of the Association for Computational Linguistics (TACL).
- Measuring massive multitask language understanding. CoRR, abs/2009.03300.
- Mistral 7b.
- Large language models are zero-shot reasoners.
- Inference-time intervention: Eliciting truthful answers from a language model.
- Contrastive decoding: Open-ended text generation as optimization.
- Deductive verification of chain-of-thought reasoning.
- Sean O’Brien and Mike Lewis. 2023. Contrastive decoding improves reasoning in large language models.
- Measuring and narrowing the compositionality gap in language models.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models.
- Towards understanding chain-of-thought prompting: An empirical study of what matters. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2717–2739, Toronto, Canada. Association for Computational Linguistics.
- Chain-of-thought prompting elicits reasoning in large language models.
- Representation engineering: A top-down approach to ai transparency.