GRIP: Geometric Routing Invariance in MoE Unlearning

Updated 28 January 2026

GRIP is an algorithm-agnostic framework that enforces invariant expert selection in Mixture-of-Experts models during machine unlearning.
It leverages null-space projection of router gradients to prevent routing manipulation and ensure that knowledge is directly removed from expert weights.
Empirical evaluations show that GRIP achieves high routing stability and preserves model utility, even under adversarial conditions and computational challenges.

Geometric Routing Invariance Preservation (GRIP) is an algorithm-agnostic framework designed to enforce expert-selection invariance in Mixture-of-Experts (MoE) models during machine unlearning. GRIP operates by constraining router parameter updates to the null space of retained-set hidden representations, preventing unlearning methods from exploiting routing manipulation shortcuts. This ensures that knowledge is erased directly from expert weights rather than by superficially hiding it through expert selection shifts, thereby maintaining model utility and robust unlearning effectiveness in large-scale MoE architectures (Zhu et al., 23 Jan 2026).

1. Background and Problem Formulation

MoE architectures in modern LLMs partition computation across multiple trainable experts per layer, with a trainable router $\Theta_\ell \in \mathbb{R}^{E \times d}$ computing selection scores $s_\ell = \Theta_\ell x_\ell$ for input $x_\ell \in \mathbb{R}^d$ . The router selects the top- $k$ experts to process each input. Traditional machine unlearning methods—including those based on maximizing loss on “forget” data or utilizing Kullback-Leibler divergence anchors—were designed for dense networks. When applied to MoE, these methods frequently manipulate the router so that queries avoid “knowledgeable” experts, a phenomenon termed Expert Selection Shift. This behavior attenuates model utility by hiding information rather than genuinely erasing it. To counter this, routing invariance must be preserved: the set of selected experts for any retained (non-forgotten) input must remain unchanged after unlearning.

Routing stability is quantified per layer by a Jaccard-based metric over query sets $\mathcal{Q}$ : $RS_\ell = \frac{1}{|\mathcal{Q}|} \sum_{x\in \mathcal{Q}} \frac{|S_\ell^{\mathrm{pre}}(x) \cap S_\ell^{\mathrm{post}}(x)|}{|S_\ell^{\mathrm{pre}}(x) \cup S_\ell^{\mathrm{post}}(x)|}$ where $S_\ell^{\mathrm{pre}}(x)$ and $S_\ell^{\mathrm{post}}(x)$ are the expert selections pre- and post-unlearning, respectively (Zhu et al., 23 Jan 2026).

2. Null-Space Geometric Constraint

The central theoretical tool in GRIP is the enforcement of a null-space constraint on router updates. For a given MoE layer $\ell$ , let $X_{r,\ell} \in \mathbb{R}^{d \times N_r}$ denote the matrix of retained-set hidden representations. The router logit scores are $S_\ell = \Theta_\ell X_{r,\ell}$ . Invariance to router updates $\Delta\Theta_\ell$ is enforced by requiring

$(\Theta_\ell + \Delta\Theta_\ell) X_{r,\ell} = \Theta_\ell X_{r,\ell} \iff \Delta\Theta_\ell X_{r,\ell} = 0$

This constraint implies that feasible updates $\Delta\Theta_\ell$ reside in the left null space $\mathcal{N}(X_{r,\ell}) = \{M \in \mathbb{R}^{E \times d} : M X_{r,\ell} = 0\}$ . Practically, GRIP projects the unconstrained router gradient $\nabla_{\Theta_\ell}L$ onto $\mathcal{N}(X_{r,\ell})$ using the eigendecomposition of $X_{r,\ell}X_{r,\ell}^\top$ and a projector $P_{\mathcal{N}(X_{r,\ell})} = \bar{U}_\ell \bar{U}_\ell^\top$ , where $\bar{U}_\ell$ contains eigenvectors with eigenvalues below a small threshold $\epsilon \approx 10^{-2}$ .

For fine-grained control, expert-specific constraints are defined via Jacobians $E_j$ constructed from retained inputs mapped to expert $j$ , with analogous null-space projections $P_{\mathcal{N}(E_j)} = I_d - E_j(E_j^\top E_j)^{\dagger} E_j^\top$ (Zhu et al., 23 Jan 2026).

3. GRIP Adapter Algorithm

GRIP is implemented as a wrapper around arbitrary gradient-based unlearning routines, making them “routing-aware” without altering their fundamental update rules or loss functions. The workflow involves precomputing projector matrices per layer based on retained-set representations, then, at each optimization step, projecting the router gradients into the null space defined by these projectors. Optionally, per-expert half-space projections enforce margin constraints on non-selected inputs.

A high-level overview of the core GRIP-Adapter pseudocode:

Algorithm GRIP‐Adapter (BaseUnlearning, X_r representations, {Θ_ℓ}, {k}, ε, τ‐margin)
  Precompute for each layer ℓ:
    Compute C_ℓ = X_{r,ℓ} X_{r,ℓ}ᵀ, eigendecompose for null-space projector P_null,ℓ
  for each unlearning step:
    BaseUnlearning.step()
    for each layer ℓ:
      Obtain ∇Θ_ℓ from base optimizer
      Project onto null space: ∇Θ_ℓ ← P_null,ℓ (∇Θ_ℓ)
      Optional expert-specific: margin constraints via half-space projection
      Overwrite Θ_ℓ.gradient with projected result

A Post-Training Correction (PTC) variant performs a single least-squares step after unlearning to exactly restore original router scores on the retain set: $\Delta\Theta_\ell = \Theta_\ell (X_{r,\ell} - X'_{r,\ell}) (X'_{r,\ell})^{\dagger}$ with computational cost $O(L d^3)$ per layer (Zhu et al., 23 Jan 2026).

4. Separation of Routing Stability and Router Plasticity

By enforcing $\Delta\Theta X_r = 0$ , GRIP fixes discrete top- $k$ expert selections for all retain-set inputs, achieving strict invariance in expert selection. However, since the hidden dimension $d$ typically exceeds the rank of $X_r$ , the null space is high-dimensional, which leaves substantial freedom (“plasticity”) for router parameters to change in directions that do not affect routing decisions. As a result, the router can still respond and adapt to unlearning gradients arising from the forget set, as long as these adjustments do not alter the original routing structure for retained data. This decouples the invariance of expert selection from the flexibility of router parameterization, compelling unlearning algorithms to genuinely remove knowledge from expert weights rather than superficially evading it through routing drift.

5. Evaluation Metrics

Three central metrics quantify unlearning and model preservation effectiveness:

Forget Accuracy (FA): Percentage of forget-set queries correctly answered after unlearning (lower is preferable).
Retain Accuracy (RA): Percentage of retain-set queries correctly answered post-unlearning (higher is desirable).
Routing Stability (RS): Jaccard similarity of expert-selections for evaluation queries pre- and post-unlearning, as formulated above (Zhu et al., 23 Jan 2026).

These metrics enable comprehensive assessment of both knowledge erasure and preservation fidelity within MoE architectures.

6. Empirical Findings

Experiments on a 30B-parameter MoE (Qwen3-MoE-30B-A3B) with 128 experts per layer and top-8 routing configuration, evaluated over benchmarks such as WMDP (hazardous knowledge) and MUSE (fictional content), report:

Routing Stability: RS improves from 0.21 (no constraint) to at least 0.94 with the online GRIP constraint, and reaches 0.99–1.00 with the PTC variant.
Retain Accuracy: Relative improvement ranges from 59% to 94% over unconstrained baselines, with performance matching or surpassing dense-model baselines.
Unlearning Effectiveness: Forget accuracies with GRIP (e.g., 0.24) are comparable to or better than unconstrained methods (e.g., 0.26).
Adversarial Robustness: Recovery via expert-forcing attacks drops from 61% (baseline) to 3% under GRIP, indicating authentic removal of sensitive knowledge rather than mere circumvention through routing (Zhu et al., 23 Jan 2026).

7. Limitations and Scope

GRIP's deployment entails notable considerations:

Memory Overhead: Storing $X_{r,\ell}$ for all layers scales as $O(L d N_r)$ (approximately 0.5 GB for a 30B-parameter model), mitigated by offloading strategies.
Computational Complexity: The PTC variant's $O(d^3)$ per-layer cost can be prohibitive for very high-dimensional routers, implying a need for future approximate methods such as SVD or sketching.
Architectural Scope: The framework presumes discrete top- $k$ MoE routing; soft or alternative routing mechanisms require adapted constraint formulations.
Verification Challenge: Perfect routing stability masks residual router drifts, rendering behavioral validation inconclusive; parameter-level audits or membership-inference assessments remain necessary for rigorous certification.

A plausible implication is that extending GRIP to architectures with non-discrete or dynamic routing presents open research directions, and full verification of knowledge erasure demands beyond behavioral metrics (Zhu et al., 23 Jan 2026).

The GRIP framework establishes a principled, geometric approach to enforcing expert-selection invariance in machine unlearning for MoE architectures, preserving utility while ensuring authentic knowledge erasure without reliance on routing manipulation shortcuts (Zhu et al., 23 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Geometric Routing Invariance Preservation (GRIP).