Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding (2409.08561v1)

Published 13 Sep 2024 in cs.CL and cs.AI

Abstract: LLMs have demonstrated remarkable capabilities in tasks requiring reasoning and multi-step problem-solving through the use of chain-of-thought (CoT) prompting. However, generating the full CoT process results in significantly longer output sequences, leading to increased computational costs and latency during inference. To address this challenge, we propose a novel approach to compress the CoT process through semantic alignment, enabling more efficient decoding while preserving the benefits of CoT reasoning. Our method introduces an auxiliary CoT model that learns to generate and compress the full thought process into a compact special token representation semantically aligned with the original CoT output. This compressed representation is then integrated into the input of the Hidden Chain-of-Thought (HCoT) model. The training process follows a two-stage procedure: First, the CoT model is optimized to generate the compressed token representations aligned with the ground-truth CoT outputs using a contrastive loss. Subsequently, with the CoT model parameters frozen, the HCoT model is fine-tuned to generate accurate subsequent predictions conditioned on the prefix instruction and the compressed CoT representations from the CoT model. Extensive experiments across three challenging domains - mathematical reasoning, agent invocation, and question answering - demonstrate that our semantic compression approach achieves competitive or improved performance compared to the full CoT baseline, while providing significant speedups of at least 1.5x in decoding time. Moreover, incorporating contrastive learning objectives further enhances the quality of the compressed representations, leading to better CoT prompting and improved task accuracy. Our work paves the way for more efficient exploitation of multi-step reasoning capabilities in LLMs across a wide range of applications.

PDF HTML Abstract

Expediting and Elevating LLM Reasoning via Hidden Chain-of-Thought Decoding

"Expediting and Elevating LLM Reasoning via Hidden Chain-of-Thought Decoding" addresses the computational inefficiencies inherent in the Chain-of-Thought (CoT) prompting framework when applied to LLMs. Traditional CoT approaches, while enhancing the reasoning capabilities of LLMs, suffer from increased computational costs and latency due to the extended output sequences they generate. This paper introduces a novel solution, the Hidden Chain-of-Thought (HCoT) framework, designed to compress the CoT process through semantic alignment, thereby improving efficiency without sacrificing performance.

Methodology Overview

The authors propose a two-stage fine-tuning methodology to achieve their goals:

Auxiliary CoT Model Training: This initial stage focuses on training an auxiliary CoT model to generate a compressed representation of the reasoning process using contrastive loss. These compressed representations are stored as special tokens, referred to as [CoT].
HCoT Model Fine-Tuning: The second stage involves fine-tuning the primary HCoT model to generate final predictions based on the prefix instruction and the compressed CoT representations from the auxiliary model.

The key innovation lies in leveraging contrastive learning to enhance the semantic compression of the reasoning process, which the HCoT model then uses during inference. By treating the intermediate reasoning steps as condensed semantic tokens, the model significantly reduces the sequence length while retaining reasoning effectiveness.

Experimental Evaluation

The experimental setup includes extensive evaluations across three challenging domains: mathematical reasoning, agent invocation, and question answering. Four datasets (GSM8K, MATH, ScienceQA, and HotpotQA) were used to benchmark the performance of the proposed HCoT framework against traditional CoT approaches and other relevant baselines.

Key Results:

The HCoT models demonstrated competitive or superior performance to the full CoT models.
The HCoT approach achieved significant speedups of 1.5x to 3.8x in decoding time while maintaining or enhancing task accuracy.
Contrastive learning objectives in the auxiliary CoT model were found to further improve task outcomes and the quality of CoT prompting.

Dataset-Specific Performance:

In mathematical reasoning tasks (GSM8K and MATH), the HCoT models showed notable gains, with the LLaMa2-13B model achieving up to 11.16% improvement in the MATH dataset.
For question answering tasks using the ScienceQA dataset, the HCoT-Contrast approach excelled, particularly in the natural science and language science subsets, demonstrating the model's versatility.
In the agent invocation task on HotpotQA, the HCoT-Contrast method outperformed traditional CoT prompting, with improvements up to 1.96%.

Practical and Theoretical Implications

Practical Implications:

The proposed HCoT framework offers tangible benefits in scenarios where computational efficiency and latency are critical. By compressing the reasoning steps into a more succinct token representation, the model can deliver faster and more efficient inference, making it feasible to deploy in real-time applications without extensive computational resources.

Theoretical Implications:

The successful integration of contrastive learning within the HCoT framework provides a potent mechanism for enhancing the quality of semantic compression in reasoning tasks. This approach aligns with recent advances in in-context learning and highlights the potential of leveraging specialized token representations to encapsulate complex cognitive processes.

Future Developments

Future research may focus on:

Scalability and Resource Utilization: Addressing potential scalability issues and optimizing the training phase to reduce resource demands.
Extending HCoT Applications: Exploring the application of HCoT in other domains, such as scientific discovery or legal reasoning, where multi-step reasoning is crucial.
Model Interpretability: Enhancing the interpretability of compressed representations to provide more transparent reasoning paths.

Conclusion

The HCoT framework presents a promising advancement in LLM reasoning by balancing the trade-off between computational efficiency and reasoning performance. The empirical results validate the efficacy of the proposed approach across multiple domains, paving the way for more efficient and scalable applications of LLMs in diverse problem-solving contexts. The integration of contrastive learning further enhances the robustness and applicability of the compressed CoT representations, positioning the HCoT framework as a strong candidate for future AI developments.