Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics (2511.04527v1)

Published 6 Nov 2025 in cs.CL and cs.AI

Abstract: When a LLM generates text, the selection of individual tokens might lead it down very different reasoning paths, making uncertainty difficult to quantify. In this work, we consider whether reasoning LLMs represent the alternate paths that they could take during generation. To test this hypothesis, we use hidden activations to control and predict a LLM's uncertainty during chain-of-thought reasoning. In our experiments, we find a clear correlation between how uncertain a model is at different tokens, and how easily the model can be steered by controlling its activations. This suggests that activation interventions are most effective when there are alternate paths available to the model -- in other words, when it has not yet committed to a particular final answer. We also find that hidden activations can predict a model's future outcome distribution, demonstrating that models implicitly represent the space of possible paths.

Summary

The paper introduces Forking Paths Analysis to quantify token-level uncertainty and map diverse outcome distributions using Bayesian change point detection.
It applies activation steering in Llama-3.2 3B to redirect outputs effectively before tokens become decisively committed.
Probing hidden states reveals that models encode latent alternate reasoning paths, offering efficient alternatives to exhaustive resampling.

Token-level Uncertainty and Hidden State Dynamics in LLMs

Introduction

This paper investigates whether autoregressive LLMs internally represent "the road not taken"—the latent space of alternate reasoning paths and outcomes at each token generation step. Specifically, the authors probe whether uncertainty and alternative completions are encoded in the model’s sequence of hidden activations during stepwise chain-of-thought (CoT) inference. The work combines Forking Paths Analysis (FPA), model steering via activation interventions, and probing techniques to elucidate the dynamics between token-level uncertainty and hidden state evolution, focusing on the Llama-3.2 3B Instruct model.

Forking Paths Analysis: Quantifying Token-level Uncertainty

The authors extend FPA methods to estimate the granular progression of uncertainty at each token during text generation. FPA quantifies how the autoregressive sampling of different tokens at position $t$ can induce widely divergent completions, thus mapping a distribution over final outcomes $o_t$ conditioned on all branching continuations seeded at $t$ .

The procedure involves:

Sampling a "base path" CoT completion.
Iterating over token positions $t$ , and for each, resampling continuations starting from alternative top-N substitutions at $t$ .
Extracting outcome answers for the completions and empirically building $o_t$ as a mixture over possible answers.

Abrupt shifts in $o_t$ —indicative of discrete “forking tokens”—are detected via Bayesian change point detection (CPD), revealing positions in the prompt where the model’s outcome uncertainty collapses or decisively commits. The computational overhead of this approach is substantial, as it requires resampling and scoring over a combinatorially expanding tree of possible continuations.

Figure 1: Our experimental set-up. By intervening on the generated tokens, we create branching paths to estimate the model's outcome distribution. By intervening on the model's activations, we steer the base generation towards a desired outcome.

Activation Steering and the Controllability–Uncertainty Link

Building on the interpretable geometry of LM representations, the paper applies a difference-in-means strategy to construct linear steering directions in hidden state space that shift the model's completions toward specific answers. For each outcome $A$ , activations $h_t^{(A)}$ (leading to $A$ ) and $h_t^{(\overline{A})}$ (leading to other outcomes) are aggregated at each token position, yielding a mean-difference vector $s_t^{(A)}$ . This vector, added to the residual stream at each token, enables targeted steering of the generation corridor.

Key findings include:

Steering is effective only at tokens preceding major commitment points: When the model is uncertain (outcome distribution $o_t$ is mixed), steering interventions successfully redirect completions toward $A$ . After passing forking tokens where $o_t$ becomes concentrated, steering effects sharply diminish.
The dynamics of steerability and outcome uncertainty are tightly coupled: Across examples, the token-wise steering success rate closely follows changes in $o_t$ , with abrupt drops in both occurring at the same CPD-identified tokens.
Figure 2: Comparison of the model outcome distribution $o_t$ (top) and steering success (bottom) across tokens. The outcome distribution and steering success have similar dynamics, with the same change points detected by the CPD algorithm (highlighted text).

Quantitatively, the correlation between baseline uncertainty (probability mass of the intended answer in $o_t$ ) and empirical steering success is moderate to strong ( $R=0.57$ for the main example, $R=0.64$ averaged), indicating that effective controllability is a marker for model indecision.

Figure 3: Correlation between steering success (y-axis) and base outcome probability (x-axis) across token positions.

These results imply that model steering by direct activation intervention is maximally potent only before the network has functionally broken degeneracies and settled on a reasoning path. This has implications for guided generation, safety, and interpretability: interventions are viable only during the model's "deliberation" phase and become inoperative once the final answer is encoded and rendered irrevocable in subsequent hidden states.

Predicting Outcome Distributions from Hidden Activations

To further examine whether a model’s hidden states embed information about alternate futures, the paper introduces a probing approach:

At each token $t$ , a linear probe is trained on the hidden activation $h_t$ to predict the empirical outcome distribution $o_t$ as constructed by FPA.
To disentangle semantic from model-specific features, the probe is also applied to embeddings $h'_t$ from a comparable LLM (Gemma-2 2B Instruct) presented with the same CoT prefix.

Both probes outperform random and majority-class baselines, with best performance in mid-to-late layers. However, probes on Llama’s own activations attain consistently lower KL divergence losses (e.g., $0.11$ at layer 8 for Llama versus $0.19$ for Gemma), especially in later layers. This demonstrates that the idiosyncratic hidden state transitions of a specific LLM—not merely the semantic content of the generated text—encode more detailed information about its own prospective branching completions.

Figure 4: Our experimental set-up for Section \ref{sec:probe}.

Figure 5: KL loss (lower is better) for linear probes predicting the outcome distribution of Llama from the hidden representations of Llama (blue) and Gemma (green) at the same token mid-generation. Low loss suggests that hidden states over chain-of-thought text are predictive of Llama's outcome distribution.

This provides strong evidence that a model’s internal representation progresses through intermediate states that retain probabilistic information about diverging outcome futures—information that is only partially accessible at the text level. Consequently, probing hidden activations offers an efficient, token-local analytical alternative to computationally expensive FPA tree expansions.

Additional Case Studies and Robustness

Supplementary analyses (Figure 6) demonstrate that the observed dynamics generalize across multiple datasets (GSM8k, AQuA, GPQA) and instances with strong answer uncertainty. The signature pattern—alignment between forking tokens, sharp drops in steering success, and abrupt changes in $o_t$ —holds consistently.

Figure 6: Three additional examples of steering analysis. Each column corresponds to a single example, with outcome distribution and steering success showing similar abrupt transitions at identified change points.

Implications and Future Directions

This work elucidates a direct mechanistic correspondence between token-level uncertainty, path dependency, and representational dynamics within transformer LLMs. Notably:

Steering and uncertainty estimation can be operationally coupled: Interventions and probes can inform real-time uncertainty diagnostics and model calibration.
Model-specific representation captures latent possibilities: Hidden activations faithfully encode not only the committed output sequence, but the probabilistic space of alternate completions.
Activation-based interventions have intrinsic limits: Once the model's trajectory passes a decisional bottleneck (as defined by FPA/CPD), the space of reachable outputs collapses, sharply reducing steering sensitivity.

Practically, this work suggests that efficient estimation of a model’s internal uncertainty—and thus the identification of optimal intervention points—can be achieved via hidden state probing rather than brute-force generation. The theoretical implication is that LLMs realize probability distributions over reasoning paths not just in their outputs, but throughout the computation, with commitment encoded by discrete transitions in representation space.

Future research could extend these methods to larger or more open-ended models, paper robustness to prompt variations, or develop real-time uncertainty-aware steering policies. Additionally, the distinction between semantic and model-specific representational components could be further dissected, possibly aiding in model alignment, safety, and interpretability.

Conclusion

By combining FPA, activation steering, and hidden state probing, this paper demonstrates that LLMs internally encode token-level uncertainty and preserve information about alternate reasoning paths during generation. The link between outcome uncertainty and controllability is empirically validated, and probing methods provide an efficient avenue for accessing outcome distributions, bypassing extensive resampling. These results have significant implications for understanding, controlling, and reliably deploying LLMs in settings where uncertainty and interpretability are paramount.