NSPO: Null-Space Constrained Policy Optimization
- NSPO is a family of geometric algorithms that restrict policy updates to the null space of constraint matrices, maintaining key invariants.
- It leverages null-space projections and Riemannian gradients to achieve local quadratic convergence and robust safety in both control synthesis and LLM alignment.
- Applications of NSPO span Schur-stable feedback control and reinforcement learning safety, delivering improved safety metrics with minimal impact on general capabilities.
Null-Space Constrained Policy Optimization (NSPO) is a family of geometric algorithms for policy optimization under explicit linear constraints. The core principle is to confine gradient or Newton-style policy updates to the null space of specified constraint matrices, ensuring that policy parameters are only allowed to vary in directions which do not compromise predefined invariants or capabilities. This framework is applicable both in feedback control synthesis under linear constraints and, more recently, in reinforcement learning settings for large-language-model (LLM) alignment, where it enables safety alignment without sacrificing general-purpose abilities.
1. Mathematical Foundation and Constraints
NSPO requires formalization of the constrained policy space as an intersection of a feasible set (e.g., stabilizing policies or general-capability-preserving policies) and an affine subspace defined by explicit linear constraints. In the context of Schur-stable feedback synthesis, this is represented as follows: let be a feedback gain, the manifold of Schur-stable controllers for system matrices , and a family of linear constraints for , . The feasible manifold is
where is an embedded submanifold of dimension under standard rank conditions (Talebi et al., 2022).
In safety alignment for LLMs, the constraint is that parameter updates should not interfere with general-capability representations as encoded in hidden-state vectors 0. The linear constraint is that 1 is only updated in directions orthogonal to the column span of 2 (Niu et al., 12 Dec 2025).
2. Null-Space Projections and Geometry
The null-space projector is constructed to ensure that updates remain feasible. Algebraically, for a set of linear constraints 3, the tangent directions are the kernel of 4, with the projection 5. For LLM alignment, the null space is defined by the singular value decomposition (SVD) of the matrix 6, with the null space spanned by columns of 7 where eigenvalues 8 are below a threshold, yielding a projector 9 (Niu et al., 12 Dec 2025).
These projections guarantee that any update 0 to the policy satisfies 1 or, in the LLM case, that the hidden-state feature representations for the general capability data remain invariant under the new policy.
3. NSPO Optimization Algorithms
Feedback Controller Synthesis
For linearly constrained quadratic-regulator design, a Riemannian metric is adopted on 2, induced by the Lyapunov solution 3. The projected (constrained) Riemannian gradient and Hessian are:
- 4
- 5
The update step solves (in vector notation): 6 with 7, and the step size 8 is capped by a stability certificate to ensure Schur stability (Talebi et al., 2022).
LLM Safety Alignment
For RL-based safety alignment, the standard policy gradient 9 is projected: 0 where 1 projects onto the null space of general-capability representations 2. Iteratively, 3 is updated as 4 (Niu et al., 12 Dec 2025).
Key theoretical results include:
- Gradient Norm Bound: 5 (since 6 is an orthogonal projector).
- Descent Guarantee: For sufficiently small step size 7, 8 (Niu et al., 12 Dec 2025).
4. Algorithmic Procedures and Pseudocode
The canonical QRNPO/NSPO algorithm for feedback synthesis is as follows (Talebi et al., 2022):
- Input system 9, constraints 0, cost function 1, and initial 2.
- For 3 until convergence:
- Compute ambient gradient and Hessian.
- Project both gradient and Hessian with 4.
- Solve for the projected Newton direction.
- Compute stability-certified step size 5.
- Update 6, with 7.
For LLM safety alignment (Niu et al., 12 Dec 2025):
- Extract hidden states 8 from base model on general-capability data.
- Compute 9 and its SVD to assemble 0.
- Initialize 1.
- For each RLHF iteration:
- Sample safety minibatch and compute policy gradient 2.
- Project: 3.
- Update: 4.
5. Empirical Findings and Theoretical Guarantees
In the LLM alignment context, NSPO achieves:
- State-of-the-art safety (e.g., lowest Attack Success Rates across seven safety benchmarks), with 5–15 point reductions relative to previous methods.
- General capabilities (math, code, instruction following) are preserved, with 5 absolute drop, matching baselines requiring mixed-task data.
- Data efficiency: only 6 of publicly available safety data required; no need to interleave general-task examples.
- Computational overhead is nominal: SVD of size 7 offline; per-step projection 8 given 9; extra 0 GPU memory, typically offloaded (Niu et al., 12 Dec 2025).
For linear-quadratic regulators, NSPO attains local quadratic convergence rates under standard second-order conditions, outpacing projected-gradient methods which yield only 1 rates (Talebi et al., 2022).
| Context | Constraint Type | Projection Matrix | Core Theoretical Guarantee |
|---|---|---|---|
| LQR | Linear in 2 | 3 | Local quadratic convergence |
| LLM Alignment | Column span of 4 | 5 | Descent, invariance of K |
6. Applications and Extensions
NSPO provides a principled solution for a wide spectrum of constrained control and learning problems where strict invariance or preservation properties are required:
- Feedback Control: Contemporary advances in constrained synthesis over the Schur-stabilizing manifold permit efficient and geometrically well-founded solutions without recourse to manifold retraction or exponential mapping (Talebi et al., 2022).
- LLM Safety Alignment: NSPO enables reinforcement learning from human feedback to improve safety without degradation of general-purpose skills, directly addressing the so-called alignment tax problem by projecting updates into capability-preserving subspaces (Niu et al., 12 Dec 2025).
A plausible implication is that NSPO can be extended to any learning setting where certain invariances or constraints must be strictly enforced, including robust control, constrained reinforcement learning beyond LLMs, and safe learning under domain-specific operational constraints.
7. Comparison with Projected and Unconstrained Methods
NSPO delivers strict constraint satisfaction by construction. Standard projected-gradient methods update parameters indiscriminately and then project back onto the constraint set, generally resulting in slower, sublinear convergence (6) and potential infeasibility between steps. NSPO leverages the geometry of the constraint manifold, with second-order updates achieving local quadratic rates and ensuring iterates remain feasible after each step. In LLM alignment, prior RLHF approaches require continual mixing of general-task data; NSPO decouples general capability preservation from this requirement through explicit projection, confining safety-driven updates to the null space of those capabilities (Talebi et al., 2022, Niu et al., 12 Dec 2025).