Validity of the content-subspace projection assumption

Ascertain whether the residual component in the per-example steering vector d_k—computed as the mean hidden state at behavior-boundary paragraphs minus the mean hidden state at execution paragraphs—lies within the content subspace estimated from question-only hidden states via SVD at the target layer used for steering vector construction.

Background

To reduce question-specific noise, the method estimates a content subspace from question-only hidden states and projects steering vectors to remove this component.

The authors emphasize that this projection is a heuristic and explicitly acknowledge that it is unknown whether the residual contamination in the per-example difference d_k actually resides in the estimated content subspace.

References

This is a heuristic: we do not know that the residual in $d_k$ falls exactly in this subspace, but question-only representations provide a reasonable proxy for the directions we want to suppress.

— Reliable Control-Point Selection for Steering Reasoning in Large Language Models (2604.02113 - Zhuang et al., 2 Apr 2026) in Section 3 (Method), Content-Subspace Projection

Validity of the content-subspace projection assumption

Background

References

Related Problems