Symplectic 16th-Order 8-Stage IRK Integrator

Updated 10 November 2025

The paper introduces a symplectic integrator based on Gauss–Legendre collocation that achieves 16th-order accuracy for non-stiff Hamiltonian systems.
It employs both fixed-point and simplified Newton iterations with SIMD optimization to significantly reduce computational costs.
The method preserves energy over long simulations using precision techniques like compensated summation and fused multiply-add instructions.

A symplectic 16th-order 8-stage implicit Runge-Kutta integrator is a numerical integration method specifically designed for non-stiff Hamiltonian systems, enabling extremely accurate long-time simulation with exact preservation of the symplectic structure. The integrator constructs its update via collocation at the Gauss–Legendre nodes, ensuring a global order of $2s = 16$ for $s=8$ stages. Recent research demonstrates that SIMD-vectorized implementations (e.g., IRKGL16-SIMD) can not only make such high-order implicit methods competitive in performance with explicit symplectic integrators but can even surpass them at high precision, particularly for moderate- to large-scale non-stiff Hamiltonian ODEs (Antoñana et al., 5 Nov 2025, Antoñana et al., 2022, Antoñana et al., 2017).

1. Gauss–Legendre Collocation and Symplecticity

The central principle underpinning the 16th-order, 8-stage symplectic IRK is the use of collocation at Gauss–Legendre nodes. For a general system

$\dot y = f(y), \quad y \in \mathbb{R}^D,$

the method seeks a degree- $s$ polynomial $P_n(t)$ over $[t_{n-1}, t_n]$ such that $P_n'(t_{n-1} + c_i h) = f(P_n(t_{n-1} + c_i h))$ at $s$ collocation points $c_i = (1+x_i)/2$ , where $\{x_i\}$ are the roots of the Legendre polynomial $P_s(x)$ on $[-1,1]$ . For $s=8$ , the method achieves classical order $16$.

The method is summarized in the Butcher tableau: $\begin{array}{c|cccccccc} c_1 & a_{11} & a_{12} & \cdots & a_{18} \ c_2 & a_{21} & a_{22} & \cdots & a_{28} \ \vdots & \vdots & \vdots & \ddots & \vdots \ c_8 & a_{81} & a_{82} & \cdots & a_{88} \ \hline & b_1 & b_2 & \cdots & b_8 \end{array}$ with $a_{ij} = \int_0^{c_i} \ell_j(\tau) d\tau$ , $b_j = \int_0^1 \ell_j(\tau) d\tau$ using the Lagrange basis $\{\ell_j\}$ .

Applied to Hamiltonian systems $\dot y = J^{-1} \nabla H(y)$ , the method is symplectic if and only if

$b_i a_{ij} + b_j a_{ji} = b_i b_j.$

Gauss–Legendre collocation satisfies this condition identically, guaranteeing both symplecticity and time symmetry (Antoñana et al., 5 Nov 2025, Antoñana et al., 2017, Antoñana et al., 2022).

2. Numerical Formulation and Solving the Implicit Equations

The method advances $y_{n}$ by solving for $s=8$ stage values $Y_i \in \mathbb{R}^D$ via

$Y_i = y_{n-1} + h \sum_{j=1}^s a_{ij} f(Y_j), \quad i = 1, \ldots, s,$

with the final update

$y_n = y_{n-1} + h \sum_{i=1}^s b_i f(Y_i).$

Two classes of nonlinear solvers are widely used:

Fixed-Point Iteration: For non-stiff Hamiltonian systems, the standard approach is to recast the scheme as

$L_i^{(k)} = h b_i f(Y_i^{(k-1)}), \quad Y_i^{(k)} = y_{n-1} + \sum_{j=1}^{s} \mu_{ij} L_j^{(k)},$

with $\mu_{ij} = a_{ij}/b_j$ enforced to satisfy $\mu_{ij} + \mu_{ji} = 1$ for all $i \neq j$ , ensuring exact symplecticity in floating point (Antoñana et al., 5 Nov 2025, Antoñana et al., 2022). Iterations are typically initialized using an extrapolation from the previous step (stage increments $L_j^{\text{(old)}}$ ), and convergence is monitored by the norm of the difference between successive iterates.

Simplified Newton Iteration: For stiffer or higher-dimensional problems, one may use a constant-Jacobian Newton step. The innovation reported in (Antoñana et al., 2017) is a reformulation converting the full $8d\times8d$ linear system solve, per Newton step, into multiple $d\times d$ solves by exploiting the structure of the Gauss–Legendre coefficients and the symplectic property. This is achieved by an orthogonal decomposition of $A$ , reducing the cost from $\mathcal{O}(512d^3)$ to $\mathcal{O}(5d^3)$ per iteration.

In all implementations, maintenance of symplecticity in finite precision requires precomputing the $\mu_{ij}$ coefficients to machine accuracy and employing safe accumulation and update strategies (e.g., compensated summation, FMA instructions), as detailed in (Antoñana et al., 2017).

3. SIMD-Vectorized Implementation (IRKGL16-SIMD)

The IRKGL16-SIMD implementation leverages parallel evaluation of all $s=8$ stages via SIMD (Single Instruction Multiple Data) instructions. Using Julia and the SIMD.jl library (Antoñana et al., 5 Nov 2025), the method wraps the $8 \times D$ stage array in an abstract type (e.g., VecArray{8,Float64}), enabling all arithmetic operations on entire stage vectors to be emitted as SIMD instructions (AVX/AVX-512). The right-hand-side function $f$ is authored generically and, provided it contains only SIMD-overloaded operations, is JIT-compiled for eightfold parallel evaluation over stages.

The mapping of the fixed-point or Newton updates to SIMD registers yields a 6–8× speedup over scalar implementations, saturating the floating-point pipelines on modern multi-core CPUs (Antoñana et al., 5 Nov 2025).

The following table summarizes key algorithmic features:

Algorithmic Feature	SIMD Implementation (IRKGL16-SIMD)	Fixed-point/Scalar Implementation
Nonlinear solves	8 stages using 8-wide SIMD double vector	Loop over 8 scalar stages
f-vectorization	Function $f$ is applied to SIMD arrays in parallel	Sequential calls to $f$
Hardware support	AVX/AVX-512 via Julia/LLVM + SIMD.jl	General CPU
Speedup (empirical)	6–8× vs. scalar IRKGL16	Baseline

4. Precision, Stability, and Round-Off Error Management

To ensure accuracy consistent with the 16th order, implementations must control round-off error at every stage and in the accumulation of updates. The recommended strategies include:

Compensated (Kahan) Summation for all vector accumulations in stage increments and in the final update $y_n$ (Antoñana et al., 2017).
Fused Multiply-Add (FMA) wherever supported.
Mixed Precision: In long-term Solar System integrations, all IRK solves and gradients are evaluated in 80-bit precision, while certain sub-flows (e.g., exact Kepler solutions) are performed in 128-bit (quadruple-precision) to maintain accuracy when increments are orders of magnitude smaller than the solution vector (Antoñana et al., 2022).
Convergence Testing: Fixed-point or Newton iterations are terminated when $||Y^{(k)} - Y^{(k-1)}||$ no longer improves (down to the meaningful precision).
Symplecticity Coefficient Enforcement: All critical Butcher tableau and derived coefficients are precomputed and symmetrized in machine arithmetic to guarantee $b_i a_{ij} + b_j a_{ji} = b_i b_j$ to round-off (Antoñana et al., 5 Nov 2025, Antoñana et al., 2017).

5. Performance Benchmarks and Applicability

Work–precision benchmarks compare IRKGL16-SIMD against a suite of high-order explicit symplectic integrators (e.g., Strang, Suzuki '90, Sobolev–Spaletta, Blanes–Moan, Blanes–Casas–Escorihuela). On three model Hamiltonian systems—charged-particle in Schwarzschild geometry, Solar System (6- and 15-/16-body), and Hénon–Heiles—the following empirical facts emerge (Antoñana et al., 5 Nov 2025, Antoñana et al., 2022):

For moderate accuracy (local errors $\sim 10^{-8}$ to $10^{-10}$ ), IRKGL16-SIMD overtakes explicit schemes of order $\leq10$ in both function evaluations and CPU time.
For very high precision (local error $\lesssim 10^{-13}$ , i.e., near machine epsilon in double precision), IRKGL16-SIMD achieves minimal integration cost (CPU time) compared to all tested explicit methods. The scalar (non-SIMD) IRKGL16 lags by $1.5$– $2\times$ but with SIMD, the implicit integrator becomes the fastest route to round-off-limited accuracy.
In the context of planetary integration (FCIRK16), close encounters between bodies are detected using proxies for analyticity radius; ambiguity is resolved by adaptively subdividing steps and using quadruple-precision sub-stepping, which preserves accuracy and still allows the implicit method’s superior overall precision/cost ratio to manifest (Antoñana et al., 2022).

A plausible implication is that for non-stiff Hamiltonian flows where very high precision is demanded, IRKGL16-SIMD (or its variants) now represent the most cost-effective algorithms.

6. Limitations and Guidelines for Use

The advantages of the symplectic 16th-order 8-stage IRK scheme—exact symplecticity, time reversibility, superconvergence, SIMD-parallelizable structure—are maximized under the following conditions (Antoñana et al., 5 Nov 2025):

High-Precision Applications: Local energy errors $\lesssim 10^{-10}$ or final aggregate errors approaching double-precision round-off.
Nonstiff Problems: Physical systems where the fixed-point (or simplified Newton) solver converges in 4–6 iterations.
Vectorizable Right-Hand Side: $f(y)$ must accept efficient SIMD vectorization (minimal conditional branching).
Moderate to Large State Dimension: SIMD acceleration is most evident in higher-dimensional systems.
Mixed-Precision Requirement: Appropriate when increment preservation (for $y_{n+1} - y_n$ ) requires accuracy exceeding native double-precision.

For lower-accuracy ( $>10^{-6}$ ) or strongly stiff systems, higher explicit symplectic or specialized stiff solvers may offer greater efficiency.

7. Algorithmic Summary: Pseudocode and Implementation Outline

A single IRKGL16-SIMD step for $y_{n-1} \in \mathbb{R}^D$ , $h > 0$ :

function IRKGL16_Step(y_prev, h, f, mu, nu, b, c)
    # y_prev: initial state, h: time step, f: vector field
    # mu, nu: precomputed symplectic coefficients, b: weights, c: nodes
    # All arrays are SIMD vectors (VecArray{8,Float64})
    Y = initial_guess(y_prev, nu, old_L)
    for k in 1:max_iter
        L = h .* b .* f(Y)
        Y_new = y_prev .+ mu * L
        if max(norm(Y_new - Y)) < tol
            break
        end
        Y = Y_new
    end
    y_next = y_prev .+ sum(L)
    return y_next
end

Key implementation details:

All operations over $i=1, \ldots, 8$ stages are carried out in 8-wide SIMD vector registers.
Accumulations and updates use compensated summation.
For fixed-point iteration, extrapolation from previous increments is recommended for initialization.
Precompute all $a_{ij}, b_i, c_i, \mu_{ij}$ in high precision and enforce symplectic symmetry constraints.

This fully vectorized approach realizes near-peak floating-point throughput on modern CPUs, enabling the high theoretical accuracy of the integrator to be practically realized in scientific simulation workflows.

References:

"SIMD-vectorized implicit symplectic integrators can outperform explicit ones" (Antoñana et al., 5 Nov 2025)
"Efficient implementation of symplectic implicit Runge-Kutta schemes with simplified Newton iterations" (Antoñana et al., 2017)
"An implicit symplectic solver for high-precision long term integrations of the Solar System" (Antoñana et al., 2022)