fGMRES: Adaptive Krylov Solver Method
- fGMRES is an iterative Krylov solver that adapts its preconditioning at each iteration to robustly handle evolving, variable-coefficient linear systems.
- It integrates a geometric multigrid V-cycle with restriction, smoothing, coarse solves, and prolongation to maintain O(N) complexity for high-resolution problems.
- The method leverages CPU parallelization and GPU acceleration, offering significant speed-ups in plasma turbulence and time-dependent PDE simulations.
The Generalised Minimal Residual Method (fGMRES) is an iterative Krylov subspace algorithm for solving large, sparse, and potentially time-dependent or ill-conditioned linear systems, particularly where the use or structure of preconditioning may evolve during the course of the iteration. fGMRES is uniquely suited to contexts such as plasma boundary turbulence simulations, where discretized elliptic partial differential equations must be solved with high efficiency and robustness as system coefficients vary dynamically across time steps and computational architectures demand high parallel performance. By enabling preconditioners that adapt over iterations, notably geometric multigrid cycles, fGMRES achieves superior convergence rates and robustness compared to standard fixed-preconditioner GMRES methods—especially in large-scale, high-resolution settings.
1. fGMRES Algorithmic Framework and Motivation
The fGMRES method generalizes the standard GMRES approach to allow iteration-dependent preconditioners. Given the linear system , classical GMRES constructs a Krylov subspace and determines by minimizing the residual over that subspace using a fixed preconditioner . fGMRES, in contrast, permits (the preconditioner at iteration ) to be varied, thus decoupling the preconditioning operation from the strict recursion of the Krylov basis. In mathematical terms, the core iteration at step is
where reflects the current search direction derived from the existing Krylov subspace. The minimization of the residual norm is then carried out in the affine space generated by all preconditioned directions up to step .
This flexibility is essential in multi-physics simulations and time-dependent PDEs, where preconditioners such as multigrid cycles may need to adjust either because of changing operator coefficients, variable mesh resolution, or adaptation to hardware constraints (e.g., number of GPU threads, available memory). In particular, when the preconditioner is itself iterative and possibly nonlinear (as in adaptive multigrid V-cycles), fGMRES ensures that progress toward the solution is maintained even as the preconditioning changes.
2. Geometric Multigrid Preconditioner Structure
The geometric multigrid preconditioner employed with fGMRES is based on constructing a hierarchy of discretized grids where the finest level () corresponds to the highest spatial resolution and each subsequent level is coarsened by powers of two in both axes. For a generalized elliptic problem,
the discretization produces a sparse matrix with typically five nonzero entries per row arising from a five-point finite-difference stencil.
The multigrid V-cycle used as a preconditioner consists of:
- Restriction: Residuals are restricted from fine (σ) to coarse (σ+1) levels using weighted averaging.
- Smoothing: At each level, a smoother—commonly a red–black Gauss–Seidel method parallelized over the mesh—is applied to reduce high-frequency errors.
- Direct Coarse Solve: At the coarsest grid, the system is solved exactly via sparse LU decomposition (e.g., PARDISO or SuperLU).
- Prolongation: Error corrections computed on the coarse grid are prolongated back to the fine grid using, e.g., bilinear interpolation.
- Correction and Post-smoothing: Corrections are added to the current solution on each level, and post-smoothing is applied to further damp errors.
The compositional pseudocode for the V-cycle, focusing on integration with fGMRES, is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
mg_cycle(x^σ, b^σ, σ): if σ == N_σ x^σ = (A^σ)^−1 b^σ return end if for i in 1:nsmooth_pre x^σ = smoother(x^σ, b^σ) end for d^σ = A^σ x^σ - b^σ d^(σ+1) = restrict(d^σ) e^(σ+1) = mg_cycle(0, d^(σ+1), σ+1) e^σ = prolong(e^(σ+1)) x^σ = x^σ - e^σ for i in 1:nsmooth_post x^σ = smoother(x^σ, b^σ) end for return x^σ |
This multigrid procedure, when embedded as the preconditioning step within fGMRES, ensures the overall iteration exhibits O(N) complexity with respect to the number of unknowns N—a critical scaling property for large-scale turbulence simulations.
3. Implementation and Parallelization Strategies
Efficient computational realization of the fGMRES-multigrid solver leverages a suite of parallel and accelerated hardware features:
- CPU Parallelization: OpenMP is used to parallelize the main field solve, particularly by dividing grid loops (such as over poloidal planes) among available CPU cores. The red–black smoother naturally allows parallel updates, and restriction/prolongation steps are likewise parallelizable.
- GPU Acceleration: For substantial speed-ups, key routines—smoother and grid transfer operators—are offloaded to GPUs using the PAccX library, which provides both CUDA and HIP backends. The code is written in C/C++ with ISO_C_BINDING for language interoperability, encapsulating solver logic for easy replacement or benchmarking of different backends, including external libraries such as rocALUTION.
Benchmarked performance shows GPU implementations achieving up to 15× speed-ups compared to CPU versions for large problems, especially when operating on grids with – points. For moderate sizes, the custom solver outperforms rocALUTION (BiCGStab + algebraic multigrid) by factors of 2–4. The scaling properties of the multigrid approach are maintained during both updating of coefficients/matrix assembly and the solve phase.
4. Interaction with Time-Dependent and Nonstationary Coefficients
In edge turbulence simulations for magnetic confinement fusion plasmas, the coefficients and boundary values are updated at each time step due to changes in plasma parameters and field geometry. Direct solvers are impractical since LU (or Cholesky) factorizations must be recomputed each time, incurring excessive cost.
The fGMRES-multigrid paradigm is effective in this setting for two key reasons:
- The structure of the V-cycle can be dynamically adjusted as coefficients change (e.g., by modifying the number/type of smoothing steps or relaxation parameters), naturally accommodating the requirements of the evolving system.
- Preconditioner flexibility in fGMRES provides robustness, maintaining fast convergence even as the discrete operator and its spectrum vary strongly between steps.
This approach is adopted in production turbulence codes such as GRILLIX and GENE-X, which require solution of generalised 2D elliptic equations on drift planes as the core computational bottleneck.
5. Comparison with External and Alternative Solvers
The fGMRES-multigrid solver integrated in the PARALLAX/PAccX libraries has been compared with external high-performance libraries, notably rocALUTION's BiCGStab with smooth-aggregated algebraic multigrid preconditioning. Empirically:
- For small to moderate system sizes, fGMRES-multigrid achieves 2–4× reduction in solution time.
- For boundary-dominated systems with mixed Dirichlet/Neumann data or complex local mesh refinements, geometric multigrid exhibits enhanced robustness.
- The modular design allows direct switching between or benchmarking of various preconditioners and solves.
The ability to seamlessly swap preconditioners at each fGMRES iteration also makes the approach suited for environments with dynamic resource allocation (heterogeneous hardware, adaptive mesh settings) and future architectural migration.
6. Significance in Production-Scale Plasma Turbulence Simulations
The combination of fGMRES with geometric multigrid preconditioning enables highly efficient, large-scale solutions of generalised Poisson-like elliptic problems under nonstationary, real-world conditions encountered in plasma physics. By exploiting both the mathematical optimality (O(N) scaling, adaptivity to coefficient/time variation) and computational optimality (CPU parallelism, GPU acceleration, modularity), this solver addresses the principal computational bottleneck in whole-device modelling of turbulent plasmas.
The integrated approach embodied in PARALLAX/PAccX serves as a backbone for modern high-fidelity boundary turbulence codes and is essential for scalable, accurate, and robust simulation of magnetic confinement devices.
Summary Table: Key Aspects of fGMRES-Multigrid Solver in Plasma Turbulence
Feature | Implementation | Advantage |
---|---|---|
Preconditioner flexiblity (fGMRES) | Adjusts at each iter. | Robustness to time-dependent system changes |
Geometric multigrid V-cycle | Recursive O(N) cost | Maintains linear scaling with mesh refinement |
Parallelization (OpenMP, GPU) | CPU/GPU backends (PAccX) | High performance on large-scale simulations |
Modular solver framework (PARALLAX/PAccX) | Swappable components | Code maintenance and portability |
Robustness to complex boundaries | Geometry-aware | Handles mixed boundary conditions/generic 2D meshes |