Papers
Topics
Authors
Recent
Search
2000 character limit reached

Reasoning Subspace Projection in HARP

Updated 7 March 2026
  • Reasoning Subspace Projection is a technique that decomposes LLM hidden states into semantic and reasoning subspaces using SVD.
  • It isolates internal cognitive traces from direct linguistic output, yielding compact features for accurate hallucination detection.
  • Empirical results demonstrate significant AUROC improvements on benchmark datasets, showcasing its efficiency and robustness.

Reasoning Subspace Projection is a core mechanism of HARP (Hallucination Detection via Reasoning Subspace Projection), a framework designed to disentangle and isolate internal reasoning information within the hidden states of LLMs for the purpose of robust hallucination detection. HARP establishes that the hidden state space of an LLM can be orthogonally decomposed into two complementary subspaces: the semantic subspace, responsible for linguistic and predictive content, and the reasoning subspace, which encodes internal cognitive traces not used directly in output generation. This decomposition leverages Singular Value Decomposition (SVD) applied to the Unembedding layer’s parameter matrix, resulting in a low-dimensional representation that centers on reasoning traces. Subsequent projection onto this reasoning subspace provides highly compact and discriminative features for detecting hallucinations in generated sequences, yielding state-of-the-art performance across standard benchmarks (Hu et al., 15 Sep 2025).

1. Direct-Sum Decomposition of Hidden States

Let HRdH_\ell \subseteq \mathbb{R}^d denote the dd-dimensional hidden state space at layer \ell of an LLM. The fundamental hypothesis of HARP is that HH_\ell admits an orthogonal decomposition:

H=SsemanticSreasoningH_\ell = S_{\text{semantic}} \oplus S_{\text{reasoning}}

where:

  • SsemanticS_{\text{semantic}} encodes information essential for next-token prediction,
  • SreasoningS_{\text{reasoning}} captures internal reasoning activity disentangled from immediate token output.

For any hidden state hHh \in H_\ell, this yields the additive split:

h=hsem+hreas,hsemSsemantic,  hreasSreasoningh = h_{\text{sem}} + h_{\text{reas}}, \quad h_{\text{sem}} \in S_{\text{semantic}},\ \ h_{\text{reas}} \in S_{\text{reasoning}}

with SsemanticSreasoningS_{\text{semantic}} \perp S_{\text{reasoning}}. This orthogonalization is a key property enabling explicit separation of reasoning-related dynamics from surface-level linguistic information.

2. Disentanglement via Unembedding Singular Value Decomposition

HARP empirically demonstrates that the Unembedding layer’s parameter matrix, WunembRV×dW_{\text{unemb}} \in \mathbb{R}^{|V| \times d}, acts as a filter that projects away the reasoning subspace components. To expose the underlying structure, one performs the SVD:

Wunemb=UΣV=i=1dσiuiviW_{\text{unemb}} = U \Sigma V^\top = \sum_{i=1}^d \sigma_i u_i v_i^\top

where URV×VU \in \mathbb{R}^{|V| \times |V|}, VRd×dV \in \mathbb{R}^{d \times d}, and Σ\Sigma contains singular values σ1σ2σd0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_d \geq 0.

Subspace assignment is determined by the energy captured in the singular values:

  • Ssemantic=span{v1,,vk}S_{\text{semantic}} = \text{span}\{v_1, \ldots, v_k\} where the top kk singular values (95%\approx 95\% of Frobenius norm energy) correspond to semantics,
  • Sreasoning=span{vk+1,,vd}S_{\text{reasoning}} = \text{span}\{v_{k+1}, \ldots, v_d\} with the remaining dk0.05dd-k \approx 0.05 d dimensions designated as reasoning.

The singular vectors {sj}j=1k\{s_j\}_{j=1}^k and {ri}i=1dk\{r_i\}_{i=1}^{d-k} form orthonormal bases for SsemanticS_{\text{semantic}} and SreasoningS_{\text{reasoning}}, respectively.

3. Reasoning Subspace Projection and Feature Construction

After constructing the basis {r1,,rdk}\{r_1, \ldots, r_{d-k}\} for the reasoning subspace, any hidden state hRdh \in \mathbb{R}^d is decomposed via

h=j=1k(sjh)sj+i=1dk(rih)ri=hsem+hreash = \sum_{j=1}^k (s_j^\top h) s_j + \sum_{i=1}^{d-k} (r_i^\top h) r_i = h_{\text{sem}} + h_{\text{reas}}

and the projection onto reasoning is given by

hreas=i=1dk(rih)rih_{\text{reas}} = \sum_{i=1}^{d-k} (r_i^\top h) r_i

The resulting hreash_{\text{reas}} is a vector of dimension dk0.05dd-k \approx 0.05d, providing a noise-filtered, reasoning-centric representation. This compactness is leveraged for efficient and robust downstream discrimination.

4. Detector Architecture, Training Paradigm, and Loss

For hallucination detection, HARP employs the following methodology:

  • At each token position ii in an answer yy, extract hreas(i)h^{(i)}_\text{reas}.
  • Pass each hreas(i)h^{(i)}_\text{reas} through a two-layer MLP (hidden size=1024\text{hidden size} = 1024, ReLU activations) to obtain a scalar gθ(hreas(i))[0,1]g_\theta(h^{(i)}_{\text{reas}}) \in [0,1].
  • Aggregate to a sequence-level hallucination score:

gθ(yx)=max1ingθ(hreas(i))g_\theta(y|x) = \max_{1 \leq i \leq n} g_\theta(h^{(i)}_{\text{reas}})

  • Label training data using beam search to sample answer candidates and annotate with hallucination flags.
  • Apply the binary cross-entropy loss:

L=[flagloggθ(yx)+(1flag)log(1gθ(yx))]\mathcal{L} = -[ \text{flag} \cdot \log g_\theta(y|x) + (1-\text{flag}) \cdot \log(1-g_\theta(y|x)) ]

  • Optimization utilizes Adam, batch size 128, learning rate 1×1041\times 10^{-4} with cosine decay, over 50 epochs.

This configuration yields high discrimination power while maintaining computational efficiency due to the severe dimensionality reduction of the input features.

5. Empirical Performance and Benchmark Results

HARP was evaluated using Qwen-2.5-7B-Instruct and LLaMA-3.1-8B as backbone LLMs across four QA-oriented datasets:

Dataset HARP AUROC Next Best AUROC Difference
NQ-Open 84.0% 78.9% +5.1%
TruthfulQA 88.1% 77.7% +10.4%
TriviaQA 92.8% 85.3% +7.5%
TyDiQA-GP 88.4% 74.8% +13.6%

Comparable or greater absolute gains are obtained using LLaMA-3.1-8B (e.g., on TriviaQA, HARP achieves 92.9% versus previous best near 76%). These results establish HARP’s Reasoning Subspace Projection as state-of-the-art for hallucination detection, surpassing prior detectors by significant margins (Hu et al., 15 Sep 2025).

6. Implications, Robustness, and Potential Extensions

Filtering out the high-dimensional semantic subspace (95%\approx 95\% of dd) and focusing on the residual reasoning subspace permits the learning of highly informative yet compact features. This approach produces a single-pass hallucination detector and demonstrates strong robustness under distribution shift, with models trained on one QA dataset maintaining performance when evaluated on others.

Potential extensions documented include:

  • Adaptive selection of kk to flexibly trade off between semantic and reasoning content.
  • Integration of projection-based hallucination avoidance into generation loops, e.g., by manipulating hreash_{\text{reas}} in intermediate model layers.
  • Application of the decomposition to additional text generation domains beyond QA, such as summarization or dialogue, and to non-autoregressive architectures.
  • Further interpretability studies examining individual reasoning basis vectors (rir_i) in connection with logical or knowledge-centric processing steps.

A plausible implication is that further exploration of the reasoning subspace could yield insights into internal cognitive representations in LLMs, and that suppression or augmentation of hreash_{\text{reas}} may impact not only hallucination but general model reliability.

Reasoning Subspace Projection as operationalized in HARP is among the first frameworks to systematically leverage structural decomposition of LLM hidden states for hallucination detection, addressing limitations of prior detectors that did not separate semantic and reasoning components. The approach is notable for its robustness, efficiency (through dimensionality reduction), and adaptability to different architectures and tasks (Hu et al., 15 Sep 2025).

Related future research directions include deeper analysis of Unembedding layer properties, extensions to more diverse multilingual and multimodal tasks, and development of causal or contrastive probing tools to further dissect the functional role of semantic and reasoning subspaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reasoning Subspace Projection (HARP).