Slerp Merging on Manifolds
- Slerp Merging is a technique that uses spherical linear interpolation to combine points on a unit sphere, preserving directions and norms for geometric fidelity.
- It is applied in numerical integration schemes, neural network token aggregation, and rotational interpolation, ensuring stability and accurate manifold constraints.
- The method overcomes shortcomings of Euclidean averaging by following geodesic paths, thereby reducing extrinsic errors and maintaining intrinsic properties.
Slerp Merging refers to the use of Spherical Linear Interpolation (SLERP) to geometrically average or interpolate between points, states, policies, or feature vectors that are constrained to or distributed on a unit sphere or, more generally, a Riemannian manifold with spherical symmetry. Its fundamental mechanism leverages the preservation of directions and norms along geodesics, distinguishing it from standard Euclidean averaging, which may introduce extrinsic errors and violate manifold constraints. SLERP merging is now a central paradigm in numerical integration schemes on manifolds, neural network token aggregation, policy/model merging, and rotational interpolation, where respecting intrinsic geometry is crucial for fidelity, stability, and interpretability.
1. Mathematical Definition and SLERP Formula
SLERP is a parametric interpolation between two points and on the unit sphere, governed by the geodesic (great-circle) arc rather than the straight line. The formula is
where is the angle between and , and parameterizes the interpolation along the geodesic. For quaternion rotations, the formula generalizes naturally, and for high-dimensional spheres, the cosine formula is replaced by dot product between unit vectors. SLERP preserves the norm (), ensures all interpolated points remain on the sphere, and is time-reversible and symplectic when embedded in dynamical systems.
A key extension is multiway spherical merging, as seen in MLERP (Kim et al., 2023), which combines several vectors according to their mutual angles and weights, generalizing the two-point SLERP to produce a single normalized output by following hyperspherical trajectories.
2. SLERP Merging in Numerical Integration on Spheres
Time-stepping schemes for ordinary differential equations (ODEs) and stiff systems constrained to the unit sphere extensively use SLERP merging. In SLERP-TVDRK (Total Variation Diminishing Runge-Kutta with SLERP) and implicit spherical Crank–Nicolson integrators (Leung et al., 14 Oct 2024, Leung, 22 Mar 2025), intermediate stage values and update formulas use SLERP, combined with the exponential map:
- Stage values are moved via the exponential map, which produces exact geodesic movement from a base point by a tangent vector.
- Convex combinations (required for high-order Runge-Kutta or midpoint evaluations) replace standard arithmetic averaging with SLERP to guarantee that merged stages remain on the sphere.
- For instance, in the STVDRK2 integrator:
This approach eliminates the need for projection steps post-Euler/Runge-Kutta updates. The SLERP-based merging preserves both geometric constraints and high-order convergence for up to third-order schemes; going beyond third order introduces ambiguity due to SLERP’s non-associativity in multiway combinations (Leung et al., 14 Oct 2024).
3. Model and Token Merging in Machine Learning
In Transformer and LLM architectures, SLERP-inspired merging (see MLERP in Token Fusion (Kim et al., 2023) and WARP policy merging (Ramé et al., 24 Jun 2024)) provides an intrinsic solution to merging neural representations, aggregation of token embeddings, and averaging policy weights:
- MLERP adapts SLERP to Fuse input token embeddings via weighted geodesic interpolation, preserving the magnitude and angular distribution. The merged token is:
are chosen based on pairwise similarities, generalizing SLERP’s mechanism to multiple inputs.
- In WARP (Ramé et al., 24 Jun 2024), SLERP merging is applied to independent policy weights, anchoring them via the initialization and then merging task vectors in policy space using:
where is the angle between and . This preserves the “direction” and “magnitude” of updates, optimizing for both KL regularization and reward.
These methods outperform linear averaging in scenarios where vector directions and norms encode semantic or reward information. SLERP merging reduces distributional shift, improves accuracy, and preserves properties expected by downstream consumer layers.
4. Rotational Interpolation and SLERP Merging
SLERP is foundational for rotational interpolation in computer graphics, robotics, and physics, especially for unit quaternions representing rotations in (Kapić et al., 2021). The Kuramoto–Lohe Interpolation (KLI) algorithm provides a dynamical process via the non-Abelian Kuramoto model:
The evolution equation:
- Starting from , this flow is numerically integrated until , staying always on .
- SLERP provides the geodesic, analytic counterpart: , which corresponds to the shortest arc.
While KLI gives a dynamical, simulation-based mechanism for merging rotation states, SLERP remains the standard for direct geodesic merging. The close match between their results validates SLERP as both a geometric and physical merging operator.
5. Limitations and Intrinsic Challenges of Slerp Merging
Despite its strengths, Slerp Merging faces intrinsic limitations:
- Extension to multiway averaging ( points) on the sphere is not uniquely defined; sequential SLERP (pairwise) is non-associative and sensitive to ordering. Fréchet mean and iterative schemes are sometimes applied, but are computationally expensive and may sacrifice accuracy for higher-order methods (Leung et al., 14 Oct 2024, Leung, 22 Mar 2025).
- For high-order explicit or implicit integrators, ambiguities arise in defining intrinsic weighted averages, constraining practical methods to orders not exceeding three.
- In ML token fusion, MLERP's efficiency is offset by increased trigonometric computations and complexity in defining weights, especially for large or batch sizes (Kim et al., 2023).
6. Practical Impact and Significance
Slerp Merging’s impact is most prominent where fidelity to spherical or manifold constraints is necessary:
- In scientific computing, SLERP-based time integrators provide robustness, improved conservation properties, and geometric exactness in simulations for physics, geometry, and engineering (Leung et al., 14 Oct 2024, Leung, 22 Mar 2025).
- In transformer-based neural models, MLERP merging preserves embedding magnitude and distribution, preventing accuracy degradation and distributional distortions under token reduction (Kim et al., 2023).
- For policy/model merging (RLHF, WARP), SLERP merging improves reward optimization and policy alignment while maintaining the pretraining prior (Ramé et al., 24 Jun 2024).
- For orientation/rotation trajectories, SLERP guarantees the shortest, physically consistent merge (applied to quaternions or axis-angle representations) (Kapić et al., 2021).
A plausible implication is that future developments in manifold-based averaging may enable higher-order SLERP-like schemes or scalable multiway token/polymer merging algorithms in neural architectures.
7. Connections to High-Order and Non-Oscillatory Interpolation
High-order, non-oscillatory (ENO/WENO) spherical interpolation methods (Fong et al., 2022) extend the SLERP paradigm by adaptively merging candidate interpolants to avoid oscillations near discontinuities. While SLERP is optimal for geodesic pairwise smooth merging, ENO-type schemes select stencils with minimal curvature (or oscillatory indicator) to suppress Gibbs-like artifacts inherent to naive high-order spherical merging. SLERP forms the basic building block for these advanced schemes, but is limited to second-order accuracy except when recursively combined with ENO selection strategies.
Slerp Merging is a comprehensive geometric framework for combining states, feature vectors, or models on spheres and manifolds. It ensures fidelity to intrinsic geometry, preserves norm and direction, and is broadly deployed in numerical integration, neural model fusion, policy averaging, and rotational interpolation. While optimal for pairwise combinations, its extension to higher-order and multiway scenarios remains an active research domain.