Parallel Test-Time Scaling for Latent Reasoning Models (2510.07745v1)

Published 9 Oct 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Parallel test-time scaling (TTS) is a pivotal approach for enhancing LLMs, typically by sampling multiple token-based chains-of-thought in parallel and aggregating outcomes through voting or search. Recent advances in latent reasoning, where intermediate reasoning unfolds in continuous vector spaces, offer a more efficient alternative to explicit Chain-of-Thought, yet whether such latent models can similarly benefit from parallel TTS remains open, mainly due to the absence of sampling mechanisms in continuous space, and the lack of probabilistic signals for advanced trajectory aggregation. \ This work enables parallel TTS for latent reasoning models by addressing the above issues. For sampling, we introduce two uncertainty-inspired stochastic strategies: Monte Carlo Dropout and Additive Gaussian Noise. For aggregation, we design a Latent Reward Model (LatentRM) trained with step-wise contrastive objective to score and guide latent reasoning. Extensive experiments and visualization analyses show that both sampling strategies scale effectively with compute and exhibit distinct exploration dynamics, while LatentRM enables effective trajectory selection. Together, our explorations open a new direction for scalable inference in continuous spaces. Code released at https://github.com/YRYangang/LatentTTS.

Summary

The paper introduces a novel framework using stochastic strategies like MC-dropout and AGN to scale latent reasoning models.
It proposes a latent reward model with a contrastive objective to aggregate diverse latent trajectories effectively.
Experimental results demonstrate improved exploration and aggregation, outperforming traditional majority voting methods.

Parallel Test-Time Scaling for Latent Reasoning Models

Introduction

The paper "Parallel Test-Time Scaling for Latent Reasoning Models" (2510.07745) presents an innovative framework to enhance the efficiency and effectiveness of LLMs through Parallel Test-Time Scaling (TTS). This mechanism traditionally involves sampling token-based chains-of-thought in parallel and employing aggregation strategies like voting or search. The focus on latent reasoning — where reasoning occurs in continuous vector spaces — presents challenges due to the lack of clear sampling methods in continuous spaces and absence of probabilistic signals for sophisticated trajectory aggregation.

Figure 1: Token-based Sampling

Methodology

Stochastic Strategies

The approach introduces two pivotal stochastic sampling strategies for scaling latent reasoning models: Monte Carlo Dropout (MC-dropout) and Additive Gaussian Noise (AGN). MC-dropout aims to incorporate epistemic uncertainty by using dropout during inference, thus capturing variability in model knowledge. AGN complements this by injecting isotropic Gaussian noise into latent thoughts, simulating aleatoric uncertainty.

Figure 2: GSM-Test

These sampling strategies enable generating multiple latent trajectories, each representing distinct reasoning paths characterized by different degrees of uncertainty.

Latent Reward Model

For aggregation, the Latent Reward Model (LatentRM) is proposed, applying a step-wise contrastive objective to evaluate latent reasoning steps analytically. This model operates by summing the latent scores across reasoning steps, acting as a proxy for trajectory quality and serving as a guide for decision-making in the latent space.

Figure 3: Coverage versus diversity for MC-dropout (red) and AGN (blue) with $N \in \{4, 8, 16, 32\}$ by sweeping $p$ and $\sigma$ to span a range of diversity values.

Experimental Results

Sampling Dynamics

Extensive experimentation indicated both MC-dropout and AGN scale well with increased computational resources, while also demonstrating distinct exploration dynamics in latent space. MC-dropout tends to facilitate structured exploration, progressively branching into less conventional solutions, while AGN promotes broad and diverse expansion, enriching environmental diversity.

Figure 4: Diversity of latent trajectories across reasoning steps on GSM-Test with COCONUT. Left: MC-dropout ( $p = 0.1$ - $0.5$). Right: AGN ( $\sigma = 0.1$ - $0.5$).

Aggregation Performance

LatentRM proved effective in trajectory selection under best-of- $N$ and beam search strategies, outperforming traditional aggregation methods such as majority voting. This confirms LatentRM’s ability to improve trajectory recognition and aggregation in latent models, validating its utility in enhancing LLM performance through parallel TTS.

Figure 5: Easy

Ablation studies demonstrated the critical roles of LatentRM’s contrastive supervision and stochastic rollout capabilities in achieving superior trajectory aggregation results.

Figure 6: GSM-Test

Conclusion

The presented framework successfully extends parallel TTS to latent reasoning models, showcasing potential for scalable and efficient parallel inference in continuous vector spaces. By addressing previous limitations in sampling and aggregation, this study opens pathways to more sophisticated latent reasoning strategies, potentially leading to more dynamic and adaptive inference processes in AI systems.

Future research could explore integrating the stochastic sampling and aggregation under reinforcement learning paradigms, allowing models to learn exploration strategies dynamically and optimize compute allocation adaptively across different reasoning tasks. Such advancements could further transform latent reasoning from static inference procedures into adaptive cognitive processes across diverse applications.