PARALLAX/PAccX Computational Libraries
- PARALLAX/PAccX libraries are high-performance computational tools designed to solve generalized elliptic equations in plasma boundary turbulence with finite-difference discretization and complex geometry handling.
- They employ an iterative fGMRES method enhanced with a geometric multigrid preconditioner and parallel red–black Gauss–Seidel smoothing to achieve O(N) scaling and robust, dynamic coefficient updates.
- By integrating both CPU and GPU acceleration, these libraries seamlessly support simulation codes like GRILLIX and GENE-X, delivering up to 15× speedup in plasma edge/SOL modeling.
The PARALLAX/PAccX libraries are high-performance computational libraries designed for the efficient solution of generalized elliptic equations arising in plasma boundary turbulence simulations. These libraries serve as a foundational component in production codes such as GRILLIX (drift-fluid) and GENE-X (full-f gyrokinetic), providing both CPU and GPU-accelerated solvers for the complex, variable-coefficient problems ubiquitous in edge/SOL modeling for magnetic confinement fusion.
1. Mathematical Formulation and Discretization
The principal challenge addressed by PARALLAX/PAccX is the solution of generalized elliptic equations derived from plasma physics, notably those appearing in Poisson and Ampère’s law formulations. The equations are abstracted as
where is the electrostatic or perturbed magnetic potential, and the coefficients , , and (potentially spatially and temporally dependent) encapsulate the diverse physical parameters such as density and temperature. The perpendicular gradient ensures fidelity to magnetic geometry.
A finite-difference discretization is adopted on a set of two-dimensional drift (or poloidal) planes. Each plane uses a logically unstructured but locally Cartesian mesh, facilitating accurate representation of complex physical boundaries (e.g., separatrix, X-points, open field lines). At an interior point the discretization induces a five-point stencil:
Boundary conditions are applied locally: Dirichlet boundaries directly set values, while Neumann boundaries use upstream values, computed using a discrete level-set approach.
2. Iterative Solver with Geometric Multigrid Preconditioning
Because the coefficients and boundary conditions in these elliptic equations may change in time, direct matrix factorization is computationally prohibitive. Instead, PARALLAX/PAccX employ an iterative approach based on a flexible generalized minimal residual method (fGMRES), with a robust geometric multigrid preconditioner. The multigrid hierarchy is generated by subsampling every second grid point from finer to coarser levels.
Key features of the geometric multigrid cycle are:
- Direct solution (e.g., by LU decomposition) at the coarsest level.
- Red–black Gauss–Seidel smoothing on each level; the smoother is redesigned into three independent groups (red, black, and boundary points) for parallel execution.
- Residuals are restricted using weighted averaging; corrections are prolongated via bilinear interpolation.
- Recursive correction cycles with pre- and post-smoothing.
A pseudocode summary:
1 2 3 4 5 6 7 8 9 |
if (on coarsest level) solve directly (e.g., LU) else pre-smooth (red–black Gauss–Seidel) compute residual d = A·x – b restrict residual to coarser level multigrid recursive correction prolong correction and update x post-smooth |
3. Parallelization and Hardware Acceleration
Parallelism in PARALLAX is achieved on CPUs using OpenMP to parallelize over the grid points within each poloidal plane. The encompassing simulation frameworks (GRILLIX, GENE-X) employ additional domain decomposition along the toroidal direction using MPI, assigning each poloidal plane to a distinct MPI rank.
To exploit accelerators, the PAccX library provides GPU support. The core components (fGMRES kernel, matrix assembly, and multigrid preconditioner) are ported to C/C++ and implemented with CUDA and HIP backends. Immutable data—such as grid topology and transfer operators—are initialized once and kept resident on the device, while only variable coefficients and vectors are transferred in the time loop. For large-scale problems, the GPU implementation achieves speedups up to 15× relative to OpenMP CPU versions, exhibiting near-linear scaling with problem size.
4. Solver Abstraction and Integration
The implementation exposes three essential routines: create
(setup), update
(inject new coefficients and boundary data), and solve
(iterative solution). This abstract interface simplifies the incorporation of alternative solvers and enables benchmarking against external libraries. Mesh ordering can be adapted (lexicographic, Morton, or red–black) to optimize locality and cache usage.
Comparison studies with rocALUTION—utilizing BiCGStab and SAAMG preconditioning—demonstrate that the in-house geometric multigrid–preconditioned fGMRES solver in PARALLAX/PAccX is generally 2–4× faster on the node level and robust across diverse system configurations.
5. Role in Plasma Boundary Turbulence Codes
In both GRILLIX and GENE-X, the elliptic solver forms a computational bottleneck due to the requirement for frequent, time-dependent solutions of equations with strongly varying coefficients. The efficient design of the PARALLAX/PAccX solvers—especially their ability to rapidly recompute with updated coefficients and adapt to mesh complexity—enables full-f turbulence simulations in edge/SOL regimes without the prohibitive cost of direct solvers. The approach is central to simulation of the electrostatic potential and the parallel component of the magnetic vector potential at every time step.
6. Design Innovations and Performance Characteristics
Several aspects distinguish this solver:
- Parallelized red–black Gauss–Seidel smoothers subdivided for concurrency.
- Mesh ordering schemes adapt memory layout for optimal performance.
- Seamless CPU/GPU integration with data staging minimizes transfer overheads.
- High robustness for time-dependent, variable-coefficient systems characteristic of plasma turbulence at the boundary.
Benchmarking confirms scaling of compute time with system size, a necessary property for large, production-scale simulations.
7. Comparison to Alternative Libraries and Broader Impact
In head-to-head comparisons, rocALUTION’s BiCGStab with SAAMG preconditioning is competitive but typically outperformed by PARALLAX/PAccX on both CPUs and GPUs. The designed abstraction layer further facilitates rapid evaluation of future library alternatives as scientific requirements evolve.
A plausible implication is that the PARALLAX/PAccX libraries, due to their flexible abstraction, hardware optimization, and integration with modern simulation workflows, set a reference implementation for elliptic solvers in plasma turbulence modeling—particularly in regimes characterized by complex geometry and dynamic coefficients (Stegmeir et al., 15 Sep 2025).