Reasoning Subspace Projection in HARP

Updated 7 March 2026

Reasoning Subspace Projection is a technique that decomposes LLM hidden states into semantic and reasoning subspaces using SVD.
It isolates internal cognitive traces from direct linguistic output, yielding compact features for accurate hallucination detection.
Empirical results demonstrate significant AUROC improvements on benchmark datasets, showcasing its efficiency and robustness.

Reasoning Subspace Projection is a core mechanism of HARP (Hallucination Detection via Reasoning Subspace Projection), a framework designed to disentangle and isolate internal reasoning information within the hidden states of LLMs for the purpose of robust hallucination detection. HARP establishes that the hidden state space of an LLM can be orthogonally decomposed into two complementary subspaces: the semantic subspace, responsible for linguistic and predictive content, and the reasoning subspace, which encodes internal cognitive traces not used directly in output generation. This decomposition leverages Singular Value Decomposition (SVD) applied to the Unembedding layer’s parameter matrix, resulting in a low-dimensional representation that centers on reasoning traces. Subsequent projection onto this reasoning subspace provides highly compact and discriminative features for detecting hallucinations in generated sequences, yielding state-of-the-art performance across standard benchmarks (Hu et al., 15 Sep 2025).

1. Direct-Sum Decomposition of Hidden States

Let $H_\ell \subseteq \mathbb{R}^d$ denote the $d$ -dimensional hidden state space at layer $\ell$ of an LLM. The fundamental hypothesis of HARP is that $H_\ell$ admits an orthogonal decomposition:

$H_\ell = S_{\text{semantic}} \oplus S_{\text{reasoning}}$

where:

$S_{\text{semantic}}$ encodes information essential for next-token prediction,
$S_{\text{reasoning}}$ captures internal reasoning activity disentangled from immediate token output.

For any hidden state $h \in H_\ell$ , this yields the additive split:

$h = h_{\text{sem}} + h_{\text{reas}}, \quad h_{\text{sem}} \in S_{\text{semantic}},\ \ h_{\text{reas}} \in S_{\text{reasoning}}$

with $S_{\text{semantic}} \perp S_{\text{reasoning}}$ . This orthogonalization is a key property enabling explicit separation of reasoning-related dynamics from surface-level linguistic information.

2. Disentanglement via Unembedding Singular Value Decomposition

HARP empirically demonstrates that the Unembedding layer’s parameter matrix, $W_{\text{unemb}} \in \mathbb{R}^{|V| \times d}$ , acts as a filter that projects away the reasoning subspace components. To expose the underlying structure, one performs the SVD:

$W_{\text{unemb}} = U \Sigma V^\top = \sum_{i=1}^d \sigma_i u_i v_i^\top$

where $U \in \mathbb{R}^{|V| \times |V|}$ , $V \in \mathbb{R}^{d \times d}$ , and $\Sigma$ contains singular values $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_d \geq 0$ .

Subspace assignment is determined by the energy captured in the singular values:

$S_{\text{semantic}} = \text{span}\{v_1, \ldots, v_k\}$ where the top $k$ singular values ( $\approx 95\%$ of Frobenius norm energy) correspond to semantics,
$S_{\text{reasoning}} = \text{span}\{v_{k+1}, \ldots, v_d\}$ with the remaining $d-k \approx 0.05 d$ dimensions designated as reasoning.

The singular vectors $\{s_j\}_{j=1}^k$ and $\{r_i\}_{i=1}^{d-k}$ form orthonormal bases for $S_{\text{semantic}}$ and $S_{\text{reasoning}}$ , respectively.

3. Reasoning Subspace Projection and Feature Construction

After constructing the basis $\{r_1, \ldots, r_{d-k}\}$ for the reasoning subspace, any hidden state $h \in \mathbb{R}^d$ is decomposed via

$h = \sum_{j=1}^k (s_j^\top h) s_j + \sum_{i=1}^{d-k} (r_i^\top h) r_i = h_{\text{sem}} + h_{\text{reas}}$

and the projection onto reasoning is given by

$h_{\text{reas}} = \sum_{i=1}^{d-k} (r_i^\top h) r_i$

The resulting $h_{\text{reas}}$ is a vector of dimension $d-k \approx 0.05d$ , providing a noise-filtered, reasoning-centric representation. This compactness is leveraged for efficient and robust downstream discrimination.

4. Detector Architecture, Training Paradigm, and Loss

For hallucination detection, HARP employs the following methodology:

At each token position $i$ in an answer $y$ , extract $h^{(i)}_\text{reas}$ .
Pass each $h^{(i)}_\text{reas}$ through a two-layer MLP ( $\text{hidden size} = 1024$ , ReLU activations) to obtain a scalar $g_\theta(h^{(i)}_{\text{reas}}) \in [0,1]$ .
Aggregate to a sequence-level hallucination score:

$g_\theta(y|x) = \max_{1 \leq i \leq n} g_\theta(h^{(i)}_{\text{reas}})$

Label training data using beam search to sample answer candidates and annotate with hallucination flags.
Apply the binary cross-entropy loss:

$\mathcal{L} = -[ \text{flag} \cdot \log g_\theta(y|x) + (1-\text{flag}) \cdot \log(1-g_\theta(y|x)) ]$

Optimization utilizes Adam, batch size 128, learning rate $1\times 10^{-4}$ with cosine decay, over 50 epochs.

This configuration yields high discrimination power while maintaining computational efficiency due to the severe dimensionality reduction of the input features.

5. Empirical Performance and Benchmark Results

HARP was evaluated using Qwen-2.5-7B-Instruct and LLaMA-3.1-8B as backbone LLMs across four QA-oriented datasets:

Dataset	HARP AUROC	Next Best AUROC	Difference
NQ-Open	84.0%	78.9%	+5.1%
TruthfulQA	88.1%	77.7%	+10.4%
TriviaQA	92.8%	85.3%	+7.5%
TyDiQA-GP	88.4%	74.8%	+13.6%

Comparable or greater absolute gains are obtained using LLaMA-3.1-8B (e.g., on TriviaQA, HARP achieves 92.9% versus previous best near 76%). These results establish HARP’s Reasoning Subspace Projection as state-of-the-art for hallucination detection, surpassing prior detectors by significant margins (Hu et al., 15 Sep 2025).

6. Implications, Robustness, and Potential Extensions

Filtering out the high-dimensional semantic subspace ( $\approx 95\%$ of $d$ ) and focusing on the residual reasoning subspace permits the learning of highly informative yet compact features. This approach produces a single-pass hallucination detector and demonstrates strong robustness under distribution shift, with models trained on one QA dataset maintaining performance when evaluated on others.

Potential extensions documented include:

Adaptive selection of $k$ to flexibly trade off between semantic and reasoning content.
Integration of projection-based hallucination avoidance into generation loops, e.g., by manipulating $h_{\text{reas}}$ in intermediate model layers.
Application of the decomposition to additional text generation domains beyond QA, such as summarization or dialogue, and to non-autoregressive architectures.
Further interpretability studies examining individual reasoning basis vectors ( $r_i$ ) in connection with logical or knowledge-centric processing steps.

A plausible implication is that further exploration of the reasoning subspace could yield insights into internal cognitive representations in LLMs, and that suppression or augmentation of $h_{\text{reas}}$ may impact not only hallucination but general model reliability.

Reasoning Subspace Projection as operationalized in HARP is among the first frameworks to systematically leverage structural decomposition of LLM hidden states for hallucination detection, addressing limitations of prior detectors that did not separate semantic and reasoning components. The approach is notable for its robustness, efficiency (through dimensionality reduction), and adaptability to different architectures and tasks (Hu et al., 15 Sep 2025).

Related future research directions include deeper analysis of Unembedding layer properties, extensions to more diverse multilingual and multimodal tasks, and development of causal or contrastive probing tools to further dissect the functional role of semantic and reasoning subspaces.

Markdown Report Issue Upgrade to Chat

References (1)

HARP: Hallucination Detection via Reasoning Subspace Projection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reasoning Subspace Projection (HARP).

Reasoning Subspace Projection in HARP

1. Direct-Sum Decomposition of Hidden States

2. Disentanglement via Unembedding Singular Value Decomposition

3. Reasoning Subspace Projection and Feature Construction

4. Detector Architecture, Training Paradigm, and Loss

5. Empirical Performance and Benchmark Results

6. Implications, Robustness, and Potential Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Reasoning Subspace Projection in HARP

1. Direct-Sum Decomposition of Hidden States

2. Disentanglement via Unembedding Singular Value Decomposition

3. Reasoning Subspace Projection and Feature Construction

4. Detector Architecture, Training Paradigm, and Loss

5. Empirical Performance and Benchmark Results

6. Implications, Robustness, and Potential Extensions

7. Context and Related Methodologies

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research