Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bucketed SPAI-GMRES-IR: Mixed-Precision Solver

Updated 11 May 2026
  • Bucketed SPAI-GMRES-IR is a mixed-precision Krylov subspace method that leverages adaptive bucketed sparse approximate inverse preconditioning to efficiently solve sparse systems.
  • It partitions the preconditioner’s entries into buckets assigned to different precision levels, balancing computational cost with accuracy.
  • Empirical results demonstrate up to 60% storage reduction and 2–5× runtime speedup while maintaining backward and forward error bounds equivalent to uniform-precision approaches.

Bucketed SPAI-GMRES-IR is a mixed-precision Krylov subspace method for solving sparse linear systems, leveraging bucketed (adaptive-precision) sparse approximate inverse (SPAI) preconditioning within a GMRES-based iterative refinement (IR) framework. The method targets reduction in computational cost and memory consumption enabled by recent hardware trends supporting multiple floating-point precisions, while maintaining accuracy guarantees equivalent to uniform-precision approaches under specified condition number regimes (Khan et al., 2023, Jiao, 2023).

1. Construction of the Bucketed SPAI Preconditioner

Let ARn×nA \in \mathbb{R}^{n\times n} be a nonsingular matrix, and let MA1M \approx A^{-1} denote a right preconditioner computed using the Frobenius-norm SPAI algorithm. Once MM is constructed (typically in higher precision), its nonzero entries are partitioned into “buckets” according to their magnitudes, with each bucket assigned a corresponding precision level.

Given q2q \geq 2 decreasing unit roundoffs u1>u2>>uqu_1 > u_2 > \ldots > u_q and a target threshold ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q), define for each row ii and for k=1,,qk=1,\ldots,q the intervals

  • Pi1=(ϵBM/u2,+)P_{i1} = (\epsilon_B\|M\|/u_2,\,+\infty),
  • Pik=(ϵBM/uk+1, ϵBM/uk]P_{ik} = (\epsilon_B\|M\|/u_{k+1},\ \epsilon_B\|M\|/u_k], MA1M \approx A^{-1}0,
  • MA1M \approx A^{-1}1.

The bucket MA1M \approx A^{-1}2 for row MA1M \approx A^{-1}3 and precision MA1M \approx A^{-1}4 consists of column indices MA1M \approx A^{-1}5 such that MA1M \approx A^{-1}6. After bucketing, a SpMV MA1M \approx A^{-1}7 is evaluated by computing, for each MA1M \approx A^{-1}8 and MA1M \approx A^{-1}9, MM0 in precision MM1, then summing MM2 in the highest precision MM3 (Khan et al., 2023).

2. Mixed-Precision GMRES-IR Framework with Bucketed SPAI

Bucketed SPAI-GMRES-IR embeds this adaptive-precision preconditioner into a five-precision GMRES-based iterative refinement. The relevant precisions are:

  • MM4 for preconditioner construction,
  • MM5 for residual computation,
  • MM6 for working storage,
  • MM7 for GMRES arithmetic,
  • MM8 for MM9-vector products.

The preconditioner apply (q2q \geq 20) in each GMRES step uses bucketed SpMV as above. All other core Krylov and orthogonalization operations remain in high precision to ensure algorithmic stability. The iterative refinement proceeds as follows:

  1. Compute q2q \geq 21 (SPAI) in precision q2q \geq 22.
  2. Bucket q2q \geq 23 entries according to the adaptive-precision rules.
  3. Compute initial solution q2q \geq 24.
  4. For each of q2q \geq 25 outer iterations:
    • Compute residual q2q \geq 26 in q2q \geq 27.
    • Use left-preconditioned GMRES to solve q2q \geq 28 to tolerance q2q \geq 29, where u1>u2>>uqu_1 > u_2 > \ldots > u_q0.
    • Update u1>u2>>uqu_1 > u_2 > \ldots > u_q1 in precision u1>u2>>uqu_1 > u_2 > \ldots > u_q2.

Backward and forward error checks determine convergence (Khan et al., 2023, Jiao, 2023).

3. Convergence Guarantees and Stability Analysis

The bucketed SpMV induces a perturbed preconditioner u1>u2>>uqu_1 > u_2 > \ldots > u_q3, with u1>u2>>uqu_1 > u_2 > \ldots > u_q4 where u1>u2>>uqu_1 > u_2 > \ldots > u_q5 depends mildly on the partition sizes. If u1>u2>>uqu_1 > u_2 > \ldots > u_q6, the additional error from bucketed evaluation does not degrade convergence relative to uniform-precision SPAI-GMRES-IR. Specifically, for suitably chosen precisions and bucket threshold, GMRES with bucketed u1>u2>>uqu_1 > u_2 > \ldots > u_q7 satisfies

u1>u2>>uqu_1 > u_2 > \ldots > u_q8

and the GMRES-IR outer iteration converges with backward/forward error u1>u2>>uqu_1 > u_2 > \ldots > u_q9 under the same spectral conditions as uniform-precision preconditioning (Khan et al., 2023).

Adopting the essential-forward-and-backward stability (EFBS) paradigm, the bucketed SPAI-GMRES-IR attains forward and backward error bounds in practical scenarios that are independent of ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)0, provided the underlying problem is well-posed and residuals are computed in high precision (Jiao, 2023).

4. Computational Cost, Storage, and Precision Allocation

Memory cost for the bucketed preconditioner is

ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)1

with ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)2 the total number of entries stored in precision ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)3 and ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)4 the storage cost per entry. For the uniform-precision case:

ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)5

The storage reduction ratio,

ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)6

quantifies memory gain. Application of ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)7 to a vector costs

ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)8

where ϵBmin(u1,,uq)\epsilon_B \leq \min(u_1,\ldots,u_q)9 is the per-operation compute cost at precision ii0. Typically ii1 for ii2, enabling significant runtime and energy savings. Parameter selection (number of buckets, thresholds, precision levels) is guided by the preconditioner’s spectrum, hardware capabilities, and the target error budget (Khan et al., 2023, Jiao, 2023).

5. Numerical Results and Empirical Trade-offs

Extensive experiments on SuiteSparse matrices and synthetic ill-conditioned systems demonstrate key trade-offs:

Matrix Method, ii3 ii4 nnz Buckets ii5 GMRES Iters (per-refine)
steam1 SPAI ii6 1.5 1105 (1105,0,0,0) 1.00 14 (7,7)
steam1 BSPAI ii7 1105 (556,537,12,0) 0.749 21 (7,7,7)
steam1 BSPAI ii8 1105 (242,284,347,232) 0.426 21 (7,7,7)

Highlights:

  • Where uniform SPAI-GMRES-IR converges, bucketed SPAI-GMRES-IR also converges within comparable iterations for small ii9.
  • Storage for the preconditioner can be reduced by up to 60% with only a mild increase in GMRES iterations.
  • For thresholds k=1,,qk=1,\ldots,q0, iteration count remains almost unchanged, with moderate storage reduction.

For randsvd and real-world matrices, the method achieves backward and forward errors on the order of k=1,,qk=1,\ldots,q1. Preconditioner application costs are substantially reduced, and wall-time is 2–5× lower than double precision direct solves, with energy savings accruing from the cheap low-precision matvecs (Khan et al., 2023, Jiao, 2023).

6. Implementation and Practical Recommendations

Effective implementation of bucketed SPAI-GMRES-IR requires:

  • Contiguous storage layouts for each bucket, enabling efficient dispatch to SIMD or GPU kernels for each precision.
  • Specialized mixed-precision BLAS kernels for bucketed SpMV in half, single, and double precisions.
  • Orthogonalization in high precision (e.g., classical Gram–Schmidt in double) to maintain stability.
  • Parallelism is maximized by the row-wise independence in bucketed SPAI matvecs and by fusing bucketed computations to minimize synchronization overhead.
  • For communication-avoiding variants in distributed-memory contexts, fusing global reductions in high precision is sufficient.

This architecture integrates readily into MPI+OpenMP or CUDA libraries, yielding EFBS-certified sparse solvers suitable for large-scale, mixed-precision HPC deployments (Jiao, 2023).

7. Theoretical and Practical Significance

Bucketed SPAI-GMRES-IR enables new forms of adaptive-precision preconditioning with provable error guarantees matching those of standard uniform-precision methods, contingent on regime-appropriate parameter selection. Empirical and theoretical analyses indicate that, under well-posedness and with proper bucket thresholding, the method is robust to the degradation often associated with low-precision arithmetic, facilitating energy and cost savings without sacrificing solution quality. A plausible implication is that further hardware trends toward mixed-precision support will amplify these gains for large-scale, sparse scientific computing (Khan et al., 2023, Jiao, 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bucketed SPAI-GMRES-IR.