Symmetric Splitting Method for mCCAdL

Updated 1 January 2026

The method delivers second-order accuracy by decomposing dynamics into five operator flows for enhanced stability.
It leverages Strang splitting to enable larger integration steps, reducing gradient evaluations while maintaining efficiency.
Empirical results confirm improved performance in Bayesian sampling tasks, outperforming first-order integrators in various benchmarks.

The symmetric splitting method for the modified covariance-controlled adaptive Langevin (mCCAdL) thermostat is a second-order accurate integrator for large-scale Bayesian sampling, leveraging Strang splitting to organize the numerical propagation of the mCCAdL system into sub-steps corresponding to distinct physical and stochastic processes. This approach supersedes the original first-order Euler discretization in CCAdL, providing enhanced stability and allowing for substantially larger integration steps while maintaining accuracy and efficiency. Implementation of the symmetric splitting revolves around decomposing the system’s dynamics into five operator flows—Hamiltonian drift, stochastic gradient, covariance-controlled momentum scaling, Ornstein–Uhlenbeck thermostat, and Nosé–Hoover thermostat—executed in a pre-designed reversible sequence which yields superior numerical properties (Wei et al., 30 Dec 2025).

1. Formulation of mCCAdL Dynamics and Operator Splitting

The continuous-time mCCAdL system consists of coupled stochastic differential equations governing the evolution of position $q \in \mathbb{R}^n$ (parameters), momentum $p \in \mathbb{R}^n$ , and thermostat variable $\xi \in \mathbb{R}$ , formulated as:

$\begin{aligned} dq &= M^{-1}p\,dt \ dp &= -\nabla U(q)\,dt + \sqrt{h\,\Sigma(q)}\,dW - \frac{h}{2}\beta\,\Sigma(q)\,p\,dt - \xi\,p\,dt + \sqrt{2A\beta^{-1}}\,M^{1/2}dW_A \ d\xi &= \mu^{-1}(p^\top M^{-1} p - n k_B T)\,dt \end{aligned}$

with $U(q)$ as the potential, $\Sigma(q)$ the covariance of stochastic-gradient noise, $M$ the mass matrix, $\beta$ the inverse temperature, $A$ artificial friction, and $\mu$ thermal mass (Wei et al., 30 Dec 2025).

The generator is split into five sub-operators ("A", "B", "C", "O", "D"):

Operator	Physical Role	Update
A	Hamiltonian drift in $q$	$q \leftarrow q + t M^{-1}p$
B	Stochastic gradient force	$p \leftarrow p + t \widetilde{F}(q)$
C	Covariance-controlled momentum scaling	$p \leftarrow \exp[-t(h/2)\beta\Sigma(q)]p$
O	Ornstein–Uhlenbeck friction/noise on $p$	$p$ via explicit OU solution
D	Nosé–Hoover thermostat for $\xi$	$\xi \leftarrow \xi + t\mu^{-1}(p^\top M^{-1}p - nk_BT)$

2. High-Order Flows: Analytical and Numerical Steps

For operators A, B, D, and O, exact closed-form solutions are available. The C flow (covariance scaling) involves the matrix exponential $\exp[-\theta\Sigma(q)]p$ with $\theta = t(h/2)\beta$ , which is numerically approximated by a high-order scaling & squaring plus truncated Taylor series method (Wei et al., 30 Dec 2025):

Diagonal shift for numerical stability: $\widetilde{\Sigma} = -\theta\Sigma$ , $\mũ = \operatorname{tr}(\widetilde{\Sigma})/n$.
Centering: $\widehat{\Sigma} = \widetilde{\Sigma} - \mũ I$.
Taylor polynomial: $\exp(\widehat{\Sigma}) \approx \left(\sum_{j=0}^m \frac{1}{j!} (\widehat{\Sigma}/s)^j\right)^s$ for chosen $(m,s)$ .
Evaluate $v_{k+1}=\sum_{j=0}^m \frac{1}{j!} (\widehat{\Sigma}/s)^j v_k$ iteratively, $p = e^{\mũ}v_s$. This provides a second-order accurate solution to the scaling flow for $p$ .

3. Strang Splitting Sequence for mCCAdL Integration

The symmetric splitting organizes the five operator flows in the precise BAO D C D OAB sequence for each time-step $h$ :

$\exp(hL) \approx \varphi_B^{h/2} \circ \varphi_A^{h/2} \circ \varphi_O^{h/2} \circ \varphi_D^{h/2} \circ \varphi_C^h \circ \varphi_D^{h/2} \circ \varphi_O^{h/2} \circ \varphi_A^{h/2} \circ \varphi_B^{h/2}$

This symmetric composition guarantees inherently reversible propagation. Each flow is either exact or high-order, making the total method second-order weakly accurate for the invariant measure sampled by the stochastic system (Wei et al., 30 Dec 2025).

4. Algorithmic Realization and Stepwise Updates

The following pseudocode summarizes one step of symmetric splitting for mCCAdL:

for each iteration:
    p ← p + (h/2) · F̃(q)         # B-half-step
    q ← q + (h/2) · M⁻¹ · p      # A-half-step
    Update p by OU (O-half-step) # O
    xi ← xi + (h/2)/mu * [pᵀ M⁻¹ p - n k_B T] # D
    # Covariance scaling (C, full-step)
    Compute Σ(q)
    theta = (h/2) * beta
    Σ̃ = -theta Σ(q)
    mũ = trace(Σ̃)/n
    Σ̂ = Σ̃ - mũ I
    Iterate v_k with Taylor series
    p ← e^{mũ} v_s
    Repeat D- and O-half-steps
    q ← q + (h/2) · M⁻¹ · p
    p ← p + (h/2) · F̃(q)

Only one computation of the stochastic gradient and one computation of the covariance $\Sigma(q)$ are required per iteration, preserving efficiency compared to CCAdL (Wei et al., 30 Dec 2025).

5. Numerical Stability, Accuracy, and Performance

The symmetric splitting scheme confers marked improvements:

Stability Bound Enhancement: Step-size $h$ limits for mCCAdL are up to $10-100\times$ those of CCAdL. In Bayesian linear regression, mCCAdL is stable at $h \approx 5 \times 10^{-3}$ – $10^{-2}$ , while CCAdL blows up beyond $h > 10^{-3}$ . Similar gains are observed in MNIST, CIFAR-10 logistic regression, and discriminative RBM training (Wei et al., 30 Dec 2025).
Second-Order Convergence: Weakly second-order accuracy is demonstrated for the invariant measure.
Efficiency: Larger $h$ directly reduces the overall gradient evaluations required for a target accuracy.

Application	$h_{\max}$ CCAdL	$h_{\max}$ mCCAdL	Step-size Gain
Bayesian linear regression	$10^{-3}$	$5\times10^{-3}$ – $10^{-2}$	$5$– $10\times$
MNIST logistic regression	$1\times10^{-4}$	$1.2\times10^{-3}$	$12\times$
CIFAR-10 logistic regression	$1\times10^{-4}$	$1.5\times10^{-3}$	$15\times$
RBM training	$<3\times10^{-2}$	$3\times10^{-2}$	$10\times$

A plausible implication is that mCCAdL is well-suited for scenarios where per-step computational cost is critical and variance in the stochastic-gradient estimator is substantial.

6. Context: Relation to Classical Strang Splitting and Operator Splitting in Kinetic Schemes

The mCCAdL symmetric splitting method is conceptually linked to the Strang-splitting principle developed for operator-split kinetic equations. In cascaded Lattice Boltzmann methods (LBM) for fluid and scalar transport, the analogous symmetric operator-split methodology achieves second-order temporal accuracy and eliminates spurious force/source artifacts by projecting forcing only onto the central moments associated with conserved quantities (Hajabdollahi et al., 2018).

In classical Strang splitting, for two non-commuting operators $P$ , $Q$ , the propagator over $\Delta t$ is $e^{(\Delta t/2)P} e^{\Delta t Q} e^{(\Delta t/2)P}$ , yielding $O(\Delta t^2)$ global accuracy. In mCCAdL, this philosophy is adapted to stochastic gradient sampling, with stability and accuracy gains paralleling those established in LBM operator-splitting contexts.

7. Benchmarking and Empirical Results

Extensive computational experiments substantiate the advantages of symmetric splitting for mCCAdL:

mCCAdL attains the lowest 2-Wasserstein distance in Bayesian linear regression across all $h$ .
In MNIST/CIFAR-10 classification, mCCAdL maintains high predictive log-likelihood and test accuracy at step-sizes where CCAdL and alternatives become unstable.
Posterior mean log-loss in binary tasks is consistently minimized by mCCAdL, with CCAdL yielding NaNs beyond its critical $h$ .
Discriminative RBM multiclass training with mCCAdL at $h \sim 3\times10^{-2}$ achieves lowest test error; alternatives deteriorate or become unstable.

All empirical results confirm second-order weak accuracy and step-size resilience. These observations suggest that symmetric splitting for mCCAdL is broadly applicable to noisy-gradient thermodynamic sampling in large-scale models and settings with substantial stochastic gradient variance (Wei et al., 30 Dec 2025).