Papers
Topics
Authors
Recent
Search
2000 character limit reached

NSPO: Null-Space Constrained Policy Optimization

Updated 8 April 2026
  • NSPO is a family of geometric algorithms that restrict policy updates to the null space of constraint matrices, maintaining key invariants.
  • It leverages null-space projections and Riemannian gradients to achieve local quadratic convergence and robust safety in both control synthesis and LLM alignment.
  • Applications of NSPO span Schur-stable feedback control and reinforcement learning safety, delivering improved safety metrics with minimal impact on general capabilities.

Null-Space Constrained Policy Optimization (NSPO) is a family of geometric algorithms for policy optimization under explicit linear constraints. The core principle is to confine gradient or Newton-style policy updates to the null space of specified constraint matrices, ensuring that policy parameters are only allowed to vary in directions which do not compromise predefined invariants or capabilities. This framework is applicable both in feedback control synthesis under linear constraints and, more recently, in reinforcement learning settings for large-language-model (LLM) alignment, where it enables safety alignment without sacrificing general-purpose abilities.

1. Mathematical Foundation and Constraints

NSPO requires formalization of the constrained policy space as an intersection of a feasible set (e.g., stabilizing policies or general-capability-preserving policies) and an affine subspace defined by explicit linear constraints. In the context of Schur-stable feedback synthesis, this is represented as follows: let KRm×nK \in \mathbb{R}^{m \times n} be a feedback gain, S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\} the manifold of Schur-stable controllers for system matrices (A,B)(A, B), and a family of linear constraints Cvec(K)=dC\,\mathrm{vec}(K) = d for CRp×(mn)C \in \mathbb{R}^{p \times (mn)}, dRpd \in \mathbb{R}^p. The feasible manifold is

M=S{K:Cvec(K)=d}M = S \cap \{K : C\,\mathrm{vec}(K) = d\}

where MM is an embedded submanifold of dimension mnpmn-p under standard rank conditions (Talebi et al., 2022).

In safety alignment for LLMs, the constraint is that parameter updates θ\theta should not interfere with general-capability representations as encoded in hidden-state vectors S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}0. The linear constraint is that S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}1 is only updated in directions orthogonal to the column span of S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}2 (Niu et al., 12 Dec 2025).

2. Null-Space Projections and Geometry

The null-space projector is constructed to ensure that updates remain feasible. Algebraically, for a set of linear constraints S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}3, the tangent directions are the kernel of S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}4, with the projection S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}5. For LLM alignment, the null space is defined by the singular value decomposition (SVD) of the matrix S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}6, with the null space spanned by columns of S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}7 where eigenvalues S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}8 are below a threshold, yielding a projector S={K:ρ(A+BK)<1}S = \{K : \rho(A + BK) < 1\}9 (Niu et al., 12 Dec 2025).

These projections guarantee that any update (A,B)(A, B)0 to the policy satisfies (A,B)(A, B)1 or, in the LLM case, that the hidden-state feature representations for the general capability data remain invariant under the new policy.

3. NSPO Optimization Algorithms

Feedback Controller Synthesis

For linearly constrained quadratic-regulator design, a Riemannian metric is adopted on (A,B)(A, B)2, induced by the Lyapunov solution (A,B)(A, B)3. The projected (constrained) Riemannian gradient and Hessian are:

  • (A,B)(A, B)4
  • (A,B)(A, B)5

The update step solves (in vector notation): (A,B)(A, B)6 with (A,B)(A, B)7, and the step size (A,B)(A, B)8 is capped by a stability certificate to ensure Schur stability (Talebi et al., 2022).

LLM Safety Alignment

For RL-based safety alignment, the standard policy gradient (A,B)(A, B)9 is projected: Cvec(K)=dC\,\mathrm{vec}(K) = d0 where Cvec(K)=dC\,\mathrm{vec}(K) = d1 projects onto the null space of general-capability representations Cvec(K)=dC\,\mathrm{vec}(K) = d2. Iteratively, Cvec(K)=dC\,\mathrm{vec}(K) = d3 is updated as Cvec(K)=dC\,\mathrm{vec}(K) = d4 (Niu et al., 12 Dec 2025).

Key theoretical results include:

  • Gradient Norm Bound: Cvec(K)=dC\,\mathrm{vec}(K) = d5 (since Cvec(K)=dC\,\mathrm{vec}(K) = d6 is an orthogonal projector).
  • Descent Guarantee: For sufficiently small step size Cvec(K)=dC\,\mathrm{vec}(K) = d7, Cvec(K)=dC\,\mathrm{vec}(K) = d8 (Niu et al., 12 Dec 2025).

4. Algorithmic Procedures and Pseudocode

The canonical QRNPO/NSPO algorithm for feedback synthesis is as follows (Talebi et al., 2022):

  1. Input system Cvec(K)=dC\,\mathrm{vec}(K) = d9, constraints CRp×(mn)C \in \mathbb{R}^{p \times (mn)}0, cost function CRp×(mn)C \in \mathbb{R}^{p \times (mn)}1, and initial CRp×(mn)C \in \mathbb{R}^{p \times (mn)}2.
  2. For CRp×(mn)C \in \mathbb{R}^{p \times (mn)}3 until convergence:
    • Compute ambient gradient and Hessian.
    • Project both gradient and Hessian with CRp×(mn)C \in \mathbb{R}^{p \times (mn)}4.
    • Solve for the projected Newton direction.
    • Compute stability-certified step size CRp×(mn)C \in \mathbb{R}^{p \times (mn)}5.
    • Update CRp×(mn)C \in \mathbb{R}^{p \times (mn)}6, with CRp×(mn)C \in \mathbb{R}^{p \times (mn)}7.

For LLM safety alignment (Niu et al., 12 Dec 2025):

  1. Extract hidden states CRp×(mn)C \in \mathbb{R}^{p \times (mn)}8 from base model on general-capability data.
  2. Compute CRp×(mn)C \in \mathbb{R}^{p \times (mn)}9 and its SVD to assemble dRpd \in \mathbb{R}^p0.
  3. Initialize dRpd \in \mathbb{R}^p1.
  4. For each RLHF iteration:
    • Sample safety minibatch and compute policy gradient dRpd \in \mathbb{R}^p2.
    • Project: dRpd \in \mathbb{R}^p3.
    • Update: dRpd \in \mathbb{R}^p4.

5. Empirical Findings and Theoretical Guarantees

In the LLM alignment context, NSPO achieves:

  • State-of-the-art safety (e.g., lowest Attack Success Rates across seven safety benchmarks), with 5–15 point reductions relative to previous methods.
  • General capabilities (math, code, instruction following) are preserved, with dRpd \in \mathbb{R}^p5 absolute drop, matching baselines requiring mixed-task data.
  • Data efficiency: only dRpd \in \mathbb{R}^p6 of publicly available safety data required; no need to interleave general-task examples.
  • Computational overhead is nominal: SVD of size dRpd \in \mathbb{R}^p7 offline; per-step projection dRpd \in \mathbb{R}^p8 given dRpd \in \mathbb{R}^p9; extra M=S{K:Cvec(K)=d}M = S \cap \{K : C\,\mathrm{vec}(K) = d\}0 GPU memory, typically offloaded (Niu et al., 12 Dec 2025).

For linear-quadratic regulators, NSPO attains local quadratic convergence rates under standard second-order conditions, outpacing projected-gradient methods which yield only M=S{K:Cvec(K)=d}M = S \cap \{K : C\,\mathrm{vec}(K) = d\}1 rates (Talebi et al., 2022).

Context Constraint Type Projection Matrix Core Theoretical Guarantee
LQR Linear in M=S{K:Cvec(K)=d}M = S \cap \{K : C\,\mathrm{vec}(K) = d\}2 M=S{K:Cvec(K)=d}M = S \cap \{K : C\,\mathrm{vec}(K) = d\}3 Local quadratic convergence
LLM Alignment Column span of M=S{K:Cvec(K)=d}M = S \cap \{K : C\,\mathrm{vec}(K) = d\}4 M=S{K:Cvec(K)=d}M = S \cap \{K : C\,\mathrm{vec}(K) = d\}5 Descent, invariance of K

6. Applications and Extensions

NSPO provides a principled solution for a wide spectrum of constrained control and learning problems where strict invariance or preservation properties are required:

  • Feedback Control: Contemporary advances in constrained synthesis over the Schur-stabilizing manifold permit efficient and geometrically well-founded solutions without recourse to manifold retraction or exponential mapping (Talebi et al., 2022).
  • LLM Safety Alignment: NSPO enables reinforcement learning from human feedback to improve safety without degradation of general-purpose skills, directly addressing the so-called alignment tax problem by projecting updates into capability-preserving subspaces (Niu et al., 12 Dec 2025).

A plausible implication is that NSPO can be extended to any learning setting where certain invariances or constraints must be strictly enforced, including robust control, constrained reinforcement learning beyond LLMs, and safe learning under domain-specific operational constraints.

7. Comparison with Projected and Unconstrained Methods

NSPO delivers strict constraint satisfaction by construction. Standard projected-gradient methods update parameters indiscriminately and then project back onto the constraint set, generally resulting in slower, sublinear convergence (M=S{K:Cvec(K)=d}M = S \cap \{K : C\,\mathrm{vec}(K) = d\}6) and potential infeasibility between steps. NSPO leverages the geometry of the constraint manifold, with second-order updates achieving local quadratic rates and ensuring iterates remain feasible after each step. In LLM alignment, prior RLHF approaches require continual mixing of general-task data; NSPO decouples general capability preservation from this requirement through explicit projection, confining safety-driven updates to the null space of those capabilities (Talebi et al., 2022, Niu et al., 12 Dec 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Null-Space Constrained Policy Optimization (NSPO).