SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning (2505.11484v2)

Published 16 May 2025 in cs.CL

Abstract: Test-Time Scaling (TTS) refers to approaches that improve reasoning performance by allocating extra computation during inference, without altering the model's parameters. While existing TTS methods operate in a discrete token space by generating more intermediate steps, recent studies in Coconut and SoftCoT have demonstrated that thinking in the continuous latent space can further enhance the reasoning performance. Such latent thoughts encode informative thinking without the information loss associated with autoregressive token generation, sparking increased interest in continuous-space reasoning. Unlike discrete decoding, where repeated sampling enables exploring diverse reasoning paths, latent representations in continuous space are fixed for a given input, which limits diverse exploration, as all decoded paths originate from the same latent thought. To overcome this limitation, we introduce SoftCoT++ to extend SoftCoT to the Test-Time Scaling paradigm by enabling diverse exploration of thinking paths. Specifically, we perturb latent thoughts via multiple specialized initial tokens and apply contrastive learning to promote diversity among soft thought representations. Experiments across five reasoning benchmarks and two distinct LLM architectures demonstrate that SoftCoT++ significantly boosts SoftCoT and also outperforms SoftCoT with self-consistency scaling. Moreover, it shows strong compatibility with conventional scaling techniques such as self-consistency. Source code is available at https://github.com/xuyige/SoftCoT.

Summary

The paper introduces SoftCoT++, a framework enhancing LLM reasoning through test-time scaling in continuous latent space using soft thoughts.
SoftCoT++ utilizes multiple initial tokens and contrastive learning to generate and diversify soft thought representations across distinct thinking and reasoning stages.
Experimental validation on five reasoning benchmarks using LLaMA-3.1 and Qwen-3 shows SoftCoT++ significantly outperforms baseline models.

SoftCoT++: Enhancing LLM Reasoning Through Test-Time Scaling in Continuous Space

The paper introduces SoftCoT++, a framework that extends the reasoning capabilities of LLMs by enabling test-time scaling in the continuous latent space of the reasoning process. This approach addresses the limitations of existing discrete token space reasoning models, which restrict their ability to capture nuanced or continuous reasoning dynamics. The paper situates its research within the broader context of Chain-of-Thought (CoT) prompting, a paradigm that has significantly improved the performance of LLMs by generating explicit intermediate reasoning steps to enhance answer accuracy.

Key Contributions and Methodologies

SoftCoT++ is built on the foundation provided by SoftCoT, which leverages continuous latent representations, called soft thoughts, to guide the reasoning process. Unlike its predecessors, SoftCoT++ scales these soft thoughts at test time, allowing for diverse exploration of reasoning paths. It introduces multiple specialized initial tokens for generating diverse soft thought representations and employs a contrastive learning objective to maintain diversity among these representations.

The methodology incorporates:

Thinking and Reasoning Stages: The framework distinctly splits the generation process into a thinking stage, involving latent soft thoughts, and a reasoning stage, comprising token generation.
Multiple Initial Tokens: To simulate parallel scaling in continuous space, SoftCoT++ generates multiple soft thought representations via diverse initial tokens, enabling the exploration of distinct reasoning paths.
Contrastive Learning: This training method promotes diversity among soft thoughts, approximating a better distribution of reasoning paths and enhancing model robustness.

Experimental Validation

SoftCoT++ was evaluated using five reasoning benchmarks, encompassing mathematical, commonsense, and symbolic reasoning tasks. The evaluations included two state-of-the-art LLM architectures: LLaMA-3.1 and Qwen-3. The comprehensive experiments demonstrate that SoftCoT++ significantly outperforms baseline models, including its predecessor SoftCoT and other reasoning models based on discrete token spaces. Notably, the new framework successfully integrates with and advances existing test-time scaling techniques, like self-consistency, to achieve enhanced reasoning diversity and accuracy.

Implications and Future Directions

The results showcase that SoftCoT++ unlocks further latent potential in LLMs by leveraging test-time scaling in the continuous latent space, illustrating that enabling diversity at the representation level can substantially improve reasoning outcomes. This advancement opens promising avenues for exploring continuous-space reasoning more thoroughly, particularly regarding scalability and model performance during training with larger models.

Looking forward, future research could focus on extrapolating these findings to even larger models and exploring the interactions between model architecture, scale, and distribution of soft thoughts. The implications of this work are significant for developing more robust AI systems capable of complex reasoning, contributing meaningfully to the field of AI and computational linguistics.

Related Papers

GitHub

GitHub - xuyige/SoftCoT: ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning (6 stars)

Tweets

https://twitter.com/GptMaestro/status/1936387320372203596