Orthogonality of SFT and RL Updates in LLMs
- The topic defines the geometric relationship between SFT and RL parameter updates using cosine similarity to assess interference in high-dimensional LLM parameter space.
- Empirical evidence from SVD and gradient concentration analyses shows near-zero alignment between SFT and RL updates, supporting modular skill transfer.
- Counterevidence reveals gradient coupling in overlapping loss landscapes, underscoring practical limitations and the need for mitigation strategies.
Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) parameter update orthogonality refers to the geometric relationship between these distinct optimization regimes in high-dimensional parameter space during the post-training of LLMs. This concept addresses whether—and under what conditions—updates from SFT and RL interfere, compete, or instead act independently (“orthogonally”) on different functional subspaces, with profound implications for catastrophic forgetting, knowledge consolidation, reasoning ability, and generalization.
1. Geometric Definition and Theoretical Context
Orthogonality of SFT and RL parameter updates formalizes whether the update vectors
(where is the change from SFT and from RL) satisfy (zero or near-zero cosine similarity) in the ambient parameter space. Strict orthogonality implies no destructive interference: one type of update does not undo or affect what the other achieves. In contrast, non-orthogonality (positive or negative cosine similarity) implies coupling; gains along one objective (e.g., cross-entropy minimization in SFT or reward maximization in RL) degrade the other, reflecting incompatible or antagonistic update directions.
This geometric view connects to functional specialization, subspace disentanglement, and error decomposition in high-dimensional models, offering a principled lens for understanding optimization interference and the limitations of naïve sequential or alternating fine-tuning.
2. Empirical Evidence from Parameter-Space Analysis
Direct evidence for SFT/RL orthogonality comes from explicit measurement of parameter update alignment on modern LLMs:
- Cosine Similarity of Updates: "Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation" quantifies the cosine similarity at the parameter and layer level after sequential SFT and RL. On Qwen2.5-7B-Instruct, all transformer layers yield cosines in , confirming near-orthogonality. As a control, SFT-on-disjoint-data yields cosines $0.3$–$0.6$, demonstrating that this is not a generic artifact of high-dimensionality, but reflects genuine subspace separation (Tang et al., 16 Jan 2026). Theoretical justification further shows, assuming LayerNorm-induced isotropy, negligible expected overlap of propagated signals between SFT and RL update directions.
- Spectral Analysis: In "RL Fine-Tuning Heals OOD Forgetting in SFT," SVD-based diagnostics on weight matrices measure rotation and cosine similarity in the subspaces spanned by the principal singular vectors (U, V). The cosine similarity between SFT-induced and RL-induced rotations is near zero for almost all layers and singular directions ( in heads, slightly negative in intermediate indices), indicating that RL reverses or undoes SFT-induced rotations in largely orthogonal directions (Jin et al., 8 Sep 2025). Analogously, "RL Is Neither a Panacea Nor a Mirage" and "The Path Not Taken" demonstrate through principal-subspace projectors and update alignment that RL fine-tuning changes low-curvature, off-principal directions orthogonal to the high-energy subspace dominantly affected by SFT (Jin et al., 22 Aug 2025, Zhu et al., 11 Nov 2025).
- PRISM and Gradient Concentration: While "Consolidation or Adaptation? PRISM" does not directly measure cosine similarity, it introduces a routing scheme based on gradient-concentration (Gini, kurtosis, coefficient of variation) to allocate data to SFT (diffuse, broad-gradient updates) or RL (concentrated, localized updates). The separation ensures that SFT updates do not interfere with RL-operated subspaces, achieving empirical near-orthogonality at the regime level, as evidenced by ablation studies and superior downstream task performance (Zhao et al., 12 Jan 2026).
3. Counterevidence: Gradient Coupling and Theoretical Non-Orthogonality
Not all studies support strict orthogonality. "On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training" proves, with formal theorems and empirical validation, that SFT and RL updates are coupled: each loss gradient has a nonzero inner product with the other's gradient, in both SFT-then-RL or RL-then-SFT orderings. Specifically,
implying that improving one loss increases regret for the other. This coupling manifests in observed increases in SFT test loss upon RL, and reward degradation upon SFT after RL, on Qwen3-0.6B (Niu et al., 12 Jan 2026). Mitigation involves mixture or constraint-based optimization, but complete decoupling via subspace orthogonality is theoretically unattainable whenever objectives differ.
Similarly, in the context of vision-LLMs (VLMs), "The Synergy Dilemma" observes that SFT and RL parameter updates are partially misaligned but not strictly orthogonal (cosine in ). All model-merging and interpolation attempts result in trade-offs, never full additive gains, consistent with "partial antagonism" in update directions (Chen et al., 10 Jul 2025).
4. Mechanistic Origins and SVD-Based Subspace Decomposition
A deeper mechanistic perspective arises from SVD analysis:
- Principal Subspace vs. Orthogonal Complement: In "The Path Not Taken," model parameters are decomposed into projections onto the top-k singular directions (principal subspace ) and their orthogonal complement :
SFT updates align with (cosine similarity ), produce spectral drift and subspace rotation, while RL updates target (cosine similarity ), preserving spectrum and principal alignment.
- Hessian Geometry: The Three-Gate Theory posits that RL, due to KL constraints and local curvature, is forced into low-curvature, off-principal directions (flatter valleys in parameter space), while SFT exploits high-curvature, pre-learned modes for rapid pattern adjustment (Zhu et al., 11 Nov 2025).
- Rotation, Not Scaling: Consistently, across several works, the singular-value spectrum is almost unchanged during SFT and RL; what shifts are the dominant singular vectors (i.e., the orientation of model capacities), with SFT causing large rotations that can drive OOD forgetting, and RL partially reversing or correcting these rotations—again in approximately orthogonal directions (Jin et al., 22 Aug 2025, Jin et al., 8 Sep 2025).
5. Algorithms and Mitigation Strategies for Interference
Approaches to reduce destructive interference include:
- Gradient Routing by Geometry: "PRISM" routes data to SFT or RL based on the spatial geometry (concentration) of per-sample parameter gradients, ensuring that SFT produces broad, orthogonal tweaks, whereas RL reserves large updates for localized, conflict-inducing data (Zhao et al., 12 Jan 2026).
- Meta-Learned Fusion: Bilevel optimization frameworks such as BRIDGE (in "Beyond Two-Stage Training") employ adaptive convex blending of SFT and RL gradients. While not enforcing hard orthogonality, this strategy reduces interference and adapts SFT signals to cooperate with RL by explicit meta-gradient updates of a cooperative gain term. Empirically, this yields better overall stability and efficiency (Chen et al., 8 Sep 2025).
- Linear Skill Injection: The near-orthogonality established in "Knowledge is Not Enough" enables modular transfer: an RL-derived skill vector can be linearly combined with a newly SFT-tuned model, directly injecting complex skills with minimal retraining, a strategy yielding strong empirical gains (Tang et al., 16 Jan 2026).
- Spectrum-Aware Restoration: Replacing or restoring top singular directions lost during SFT with their pre-fine-tuning values, or leveraging low-rank subspace merging, can rapidly recover generalization and OOD performance, sidestepping costly RL retraining (Jin et al., 22 Aug 2025, Jin et al., 8 Sep 2025).
6. Practical Impact and Limitations
Empirical consequences of SFT/RL update geometry are robust:
- Catastrophic Forgetting Mitigation: Disentangling SFT and RL regimes, via gradient routing or subspace decoupling, improves final performance, reduces RL sample complexity, and prevents catastrophic forgetting of early SFT-acquired knowledge (Zhao et al., 12 Jan 2026, Jin et al., 8 Sep 2025).
- Skill–Knowledge Modularization: Modular skill transfer via orthogonal composition supports rapid adaptation to new domains or tasks without repeated, costly RL optimization (Tang et al., 16 Jan 2026).
- Boundaries of Recovery: When the SFT stage induces excessive singular-vector rotation (overfitting), even an orthogonal RL update cannot fully recover lost OOD capabilities, imposing practical constraints on sequential two-stage protocols (Jin et al., 22 Aug 2025).
However, strict orthogonality is neither universal nor unconditional. The existence of objective coupling, especially in domains or architectures where SFT and RL loss landscapes substantially overlap, precludes total independence. Some reported approaches may only achieve approximate or partial orthogonality, and intervening with hard constraints (e.g., projection of gradients) is not always practical or effective given the complex, evolving geometry of LLM parameter space (Niu et al., 12 Jan 2026, Chen et al., 10 Jul 2025).
7. Summary Table: Empirical Findings on SFT/RL Update Orthogonality
| Paper | Orthogonality Metric | Result |
|---|---|---|
| "Knowledge is Not Enough" (Tang et al., 16 Jan 2026) | Update cosine similarity | Near-zero cosine: strong orthogonality (all layers) |
| "RL Fine-Tuning Heals OOD Forgetting" (Jin et al., 8 Sep 2025) | Singular-vector rotation & cosine | Cosine 0 in most regions; RL undoes SFT rotations orthogonally |
| "The Path Not Taken" (Zhu et al., 11 Nov 2025) | Principal subspace projection | SFT: align with principal; RL: align off-principal; overlap 0.08 |
| "PRISM" (Zhao et al., 12 Jan 2026) | Gradient concentration | Indirect evidence: regime separation reduces interference |
| "Non-decoupling..." (Niu et al., 12 Jan 2026) | Gradient inner product | Inner products nonzero: strict orthogonality provably absent |
| "Synergy Dilemma" (Chen et al., 10 Jul 2025) | (Not directly measured) | Partial misalignment (cos (0,1)); trade-offs in interpolation curves |
References
- "Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation" (Tang et al., 16 Jan 2026)
- "RL Fine-Tuning Heals OOD Forgetting in SFT" (Jin et al., 8 Sep 2025)
- "The Path Not Taken: RLVR Provably Learns Off the Principals" (Zhu et al., 11 Nov 2025)
- "Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration" (Zhao et al., 12 Jan 2026)
- "On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training" (Niu et al., 12 Jan 2026)
- "RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs" (Jin et al., 22 Aug 2025)
- "The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs" (Chen et al., 10 Jul 2025)
- "Beyond Two-Stage Training: Cooperative SFT and RL for LLM Reasoning" (Chen et al., 8 Sep 2025)