Bucketed SPAI-GMRES-IR: Mixed-Precision Solver
- Bucketed SPAI-GMRES-IR is a mixed-precision Krylov subspace method that leverages adaptive bucketed sparse approximate inverse preconditioning to efficiently solve sparse systems.
- It partitions the preconditioner’s entries into buckets assigned to different precision levels, balancing computational cost with accuracy.
- Empirical results demonstrate up to 60% storage reduction and 2–5× runtime speedup while maintaining backward and forward error bounds equivalent to uniform-precision approaches.
Bucketed SPAI-GMRES-IR is a mixed-precision Krylov subspace method for solving sparse linear systems, leveraging bucketed (adaptive-precision) sparse approximate inverse (SPAI) preconditioning within a GMRES-based iterative refinement (IR) framework. The method targets reduction in computational cost and memory consumption enabled by recent hardware trends supporting multiple floating-point precisions, while maintaining accuracy guarantees equivalent to uniform-precision approaches under specified condition number regimes (Khan et al., 2023, Jiao, 2023).
1. Construction of the Bucketed SPAI Preconditioner
Let be a nonsingular matrix, and let denote a right preconditioner computed using the Frobenius-norm SPAI algorithm. Once is constructed (typically in higher precision), its nonzero entries are partitioned into “buckets” according to their magnitudes, with each bucket assigned a corresponding precision level.
Given decreasing unit roundoffs and a target threshold , define for each row and for the intervals
- ,
- , 0,
- 1.
The bucket 2 for row 3 and precision 4 consists of column indices 5 such that 6. After bucketing, a SpMV 7 is evaluated by computing, for each 8 and 9, 0 in precision 1, then summing 2 in the highest precision 3 (Khan et al., 2023).
2. Mixed-Precision GMRES-IR Framework with Bucketed SPAI
Bucketed SPAI-GMRES-IR embeds this adaptive-precision preconditioner into a five-precision GMRES-based iterative refinement. The relevant precisions are:
- 4 for preconditioner construction,
- 5 for residual computation,
- 6 for working storage,
- 7 for GMRES arithmetic,
- 8 for 9-vector products.
The preconditioner apply (0) in each GMRES step uses bucketed SpMV as above. All other core Krylov and orthogonalization operations remain in high precision to ensure algorithmic stability. The iterative refinement proceeds as follows:
- Compute 1 (SPAI) in precision 2.
- Bucket 3 entries according to the adaptive-precision rules.
- Compute initial solution 4.
- For each of 5 outer iterations:
- Compute residual 6 in 7.
- Use left-preconditioned GMRES to solve 8 to tolerance 9, where 0.
- Update 1 in precision 2.
Backward and forward error checks determine convergence (Khan et al., 2023, Jiao, 2023).
3. Convergence Guarantees and Stability Analysis
The bucketed SpMV induces a perturbed preconditioner 3, with 4 where 5 depends mildly on the partition sizes. If 6, the additional error from bucketed evaluation does not degrade convergence relative to uniform-precision SPAI-GMRES-IR. Specifically, for suitably chosen precisions and bucket threshold, GMRES with bucketed 7 satisfies
8
and the GMRES-IR outer iteration converges with backward/forward error 9 under the same spectral conditions as uniform-precision preconditioning (Khan et al., 2023).
Adopting the essential-forward-and-backward stability (EFBS) paradigm, the bucketed SPAI-GMRES-IR attains forward and backward error bounds in practical scenarios that are independent of 0, provided the underlying problem is well-posed and residuals are computed in high precision (Jiao, 2023).
4. Computational Cost, Storage, and Precision Allocation
Memory cost for the bucketed preconditioner is
1
with 2 the total number of entries stored in precision 3 and 4 the storage cost per entry. For the uniform-precision case:
5
The storage reduction ratio,
6
quantifies memory gain. Application of 7 to a vector costs
8
where 9 is the per-operation compute cost at precision 0. Typically 1 for 2, enabling significant runtime and energy savings. Parameter selection (number of buckets, thresholds, precision levels) is guided by the preconditioner’s spectrum, hardware capabilities, and the target error budget (Khan et al., 2023, Jiao, 2023).
5. Numerical Results and Empirical Trade-offs
Extensive experiments on SuiteSparse matrices and synthetic ill-conditioned systems demonstrate key trade-offs:
| Matrix | Method, 3 | 4 | nnz Buckets | 5 | GMRES Iters (per-refine) |
|---|---|---|---|---|---|
| steam1 | SPAI 6 | 1.5 | 1105 (1105,0,0,0) | 1.00 | 14 (7,7) |
| steam1 | BSPAI 7 | 1105 (556,537,12,0) | 0.749 | 21 (7,7,7) | |
| steam1 | BSPAI 8 | 1105 (242,284,347,232) | 0.426 | 21 (7,7,7) |
Highlights:
- Where uniform SPAI-GMRES-IR converges, bucketed SPAI-GMRES-IR also converges within comparable iterations for small 9.
- Storage for the preconditioner can be reduced by up to 60% with only a mild increase in GMRES iterations.
- For thresholds 0, iteration count remains almost unchanged, with moderate storage reduction.
For randsvd and real-world matrices, the method achieves backward and forward errors on the order of 1. Preconditioner application costs are substantially reduced, and wall-time is 2–5× lower than double precision direct solves, with energy savings accruing from the cheap low-precision matvecs (Khan et al., 2023, Jiao, 2023).
6. Implementation and Practical Recommendations
Effective implementation of bucketed SPAI-GMRES-IR requires:
- Contiguous storage layouts for each bucket, enabling efficient dispatch to SIMD or GPU kernels for each precision.
- Specialized mixed-precision BLAS kernels for bucketed SpMV in half, single, and double precisions.
- Orthogonalization in high precision (e.g., classical Gram–Schmidt in double) to maintain stability.
- Parallelism is maximized by the row-wise independence in bucketed SPAI matvecs and by fusing bucketed computations to minimize synchronization overhead.
- For communication-avoiding variants in distributed-memory contexts, fusing global reductions in high precision is sufficient.
This architecture integrates readily into MPI+OpenMP or CUDA libraries, yielding EFBS-certified sparse solvers suitable for large-scale, mixed-precision HPC deployments (Jiao, 2023).
7. Theoretical and Practical Significance
Bucketed SPAI-GMRES-IR enables new forms of adaptive-precision preconditioning with provable error guarantees matching those of standard uniform-precision methods, contingent on regime-appropriate parameter selection. Empirical and theoretical analyses indicate that, under well-posedness and with proper bucket thresholding, the method is robust to the degradation often associated with low-precision arithmetic, facilitating energy and cost savings without sacrificing solution quality. A plausible implication is that further hardware trends toward mixed-precision support will amplify these gains for large-scale, sparse scientific computing (Khan et al., 2023, Jiao, 2023).