Bucketed SPAI-GMRES-IR: Mixed-Precision Solver

Updated 11 May 2026

Bucketed SPAI-GMRES-IR is a mixed-precision Krylov subspace method that leverages adaptive bucketed sparse approximate inverse preconditioning to efficiently solve sparse systems.
It partitions the preconditioner’s entries into buckets assigned to different precision levels, balancing computational cost with accuracy.
Empirical results demonstrate up to 60% storage reduction and 2–5× runtime speedup while maintaining backward and forward error bounds equivalent to uniform-precision approaches.

Bucketed SPAI-GMRES-IR is a mixed-precision Krylov subspace method for solving sparse linear systems, leveraging bucketed (adaptive-precision) sparse approximate inverse (SPAI) preconditioning within a GMRES-based iterative refinement (IR) framework. The method targets reduction in computational cost and memory consumption enabled by recent hardware trends supporting multiple floating-point precisions, while maintaining accuracy guarantees equivalent to uniform-precision approaches under specified condition number regimes (Khan et al., 2023, Jiao, 2023).

1. Construction of the Bucketed SPAI Preconditioner

Let $A \in \mathbb{R}^{n\times n}$ be a nonsingular matrix, and let $M \approx A^{-1}$ denote a right preconditioner computed using the Frobenius-norm SPAI algorithm. Once $M$ is constructed (typically in higher precision), its nonzero entries are partitioned into “buckets” according to their magnitudes, with each bucket assigned a corresponding precision level.

Given $q \geq 2$ decreasing unit roundoffs $u_1 > u_2 > \ldots > u_q$ and a target threshold $\epsilon_B \leq \min(u_1,\ldots,u_q)$ , define for each row $i$ and for $k=1,\ldots,q$ the intervals

$P_{i1} = (\epsilon_B\|M\|/u_2,\,+\infty)$ ,
$P_{ik} = (\epsilon_B\|M\|/u_{k+1},\ \epsilon_B\|M\|/u_k]$ , $M \approx A^{-1}$ 0,
$M \approx A^{-1}$ 1.

The bucket $M \approx A^{-1}$ 2 for row $M \approx A^{-1}$ 3 and precision $M \approx A^{-1}$ 4 consists of column indices $M \approx A^{-1}$ 5 such that $M \approx A^{-1}$ 6. After bucketing, a SpMV $M \approx A^{-1}$ 7 is evaluated by computing, for each $M \approx A^{-1}$ 8 and $M \approx A^{-1}$ 9, $M$ 0 in precision $M$ 1, then summing $M$ 2 in the highest precision $M$ 3 (Khan et al., 2023).

2. Mixed-Precision GMRES-IR Framework with Bucketed SPAI

Bucketed SPAI-GMRES-IR embeds this adaptive-precision preconditioner into a five-precision GMRES-based iterative refinement. The relevant precisions are:

$M$ 4 for preconditioner construction,
$M$ 5 for residual computation,
$M$ 6 for working storage,
$M$ 7 for GMRES arithmetic,
$M$ 8 for $M$ 9-vector products.

The preconditioner apply ( $q \geq 2$ 0) in each GMRES step uses bucketed SpMV as above. All other core Krylov and orthogonalization operations remain in high precision to ensure algorithmic stability. The iterative refinement proceeds as follows:

Compute $q \geq 2$ 1 (SPAI) in precision $q \geq 2$ 2.
Bucket $q \geq 2$ 3 entries according to the adaptive-precision rules.
Compute initial solution $q \geq 2$ 4.
For each of $q \geq 2$ $q \geq 2$ 5 outer iterations:
- Compute residual $q \geq 2$ 6 in $q \geq 2$ 7.
- Use left-preconditioned GMRES to solve $q \geq 2$ 8 to tolerance $q \geq 2$ 9, where $u_1 > u_2 > \ldots > u_q$ 0.
- Update $u_1 > u_2 > \ldots > u_q$ 1 in precision $u_1 > u_2 > \ldots > u_q$ 2.

Backward and forward error checks determine convergence (Khan et al., 2023, Jiao, 2023).

3. Convergence Guarantees and Stability Analysis

The bucketed SpMV induces a perturbed preconditioner $u_1 > u_2 > \ldots > u_q$ 3, with $u_1 > u_2 > \ldots > u_q$ 4 where $u_1 > u_2 > \ldots > u_q$ 5 depends mildly on the partition sizes. If $u_1 > u_2 > \ldots > u_q$ 6, the additional error from bucketed evaluation does not degrade convergence relative to uniform-precision SPAI-GMRES-IR. Specifically, for suitably chosen precisions and bucket threshold, GMRES with bucketed $u_1 > u_2 > \ldots > u_q$ 7 satisfies

$u_1 > u_2 > \ldots > u_q$ 8

and the GMRES-IR outer iteration converges with backward/forward error $u_1 > u_2 > \ldots > u_q$ 9 under the same spectral conditions as uniform-precision preconditioning (Khan et al., 2023).

Adopting the essential-forward-and-backward stability (EFBS) paradigm, the bucketed SPAI-GMRES-IR attains forward and backward error bounds in practical scenarios that are independent of $\epsilon_B \leq \min(u_1,\ldots,u_q)$ 0, provided the underlying problem is well-posed and residuals are computed in high precision (Jiao, 2023).

4. Computational Cost, Storage, and Precision Allocation

Memory cost for the bucketed preconditioner is

$\epsilon_B \leq \min(u_1,\ldots,u_q)$ 1

with $\epsilon_B \leq \min(u_1,\ldots,u_q)$ 2 the total number of entries stored in precision $\epsilon_B \leq \min(u_1,\ldots,u_q)$ 3 and $\epsilon_B \leq \min(u_1,\ldots,u_q)$ 4 the storage cost per entry. For the uniform-precision case:

$\epsilon_B \leq \min(u_1,\ldots,u_q)$ 5

The storage reduction ratio,

$\epsilon_B \leq \min(u_1,\ldots,u_q)$ 6

quantifies memory gain. Application of $\epsilon_B \leq \min(u_1,\ldots,u_q)$ 7 to a vector costs

$\epsilon_B \leq \min(u_1,\ldots,u_q)$ 8

where $\epsilon_B \leq \min(u_1,\ldots,u_q)$ 9 is the per-operation compute cost at precision $i$ 0. Typically $i$ 1 for $i$ 2, enabling significant runtime and energy savings. Parameter selection (number of buckets, thresholds, precision levels) is guided by the preconditioner’s spectrum, hardware capabilities, and the target error budget (Khan et al., 2023, Jiao, 2023).

5. Numerical Results and Empirical Trade-offs

Extensive experiments on SuiteSparse matrices and synthetic ill-conditioned systems demonstrate key trade-offs:

Matrix	Method, $i$ 3	$i$ 4	nnz Buckets	$i$ 5	GMRES Iters (per-refine)
steam1	SPAI $i$ 6	1.5	1105 (1105,0,0,0)	1.00	14 (7,7)
steam1	BSPAI $i$ 7		1105 (556,537,12,0)	0.749	21 (7,7,7)
steam1	BSPAI $i$ 8		1105 (242,284,347,232)	0.426	21 (7,7,7)

Highlights:

Where uniform SPAI-GMRES-IR converges, bucketed SPAI-GMRES-IR also converges within comparable iterations for small $i$ 9.
Storage for the preconditioner can be reduced by up to 60% with only a mild increase in GMRES iterations.
For thresholds $k=1,\ldots,q$ 0, iteration count remains almost unchanged, with moderate storage reduction.

For randsvd and real-world matrices, the method achieves backward and forward errors on the order of $k=1,\ldots,q$ 1. Preconditioner application costs are substantially reduced, and wall-time is 2–5× lower than double precision direct solves, with energy savings accruing from the cheap low-precision matvecs (Khan et al., 2023, Jiao, 2023).

6. Implementation and Practical Recommendations

Effective implementation of bucketed SPAI-GMRES-IR requires:

Contiguous storage layouts for each bucket, enabling efficient dispatch to SIMD or GPU kernels for each precision.
Specialized mixed-precision BLAS kernels for bucketed SpMV in half, single, and double precisions.
Orthogonalization in high precision (e.g., classical Gram–Schmidt in double) to maintain stability.
Parallelism is maximized by the row-wise independence in bucketed SPAI matvecs and by fusing bucketed computations to minimize synchronization overhead.
For communication-avoiding variants in distributed-memory contexts, fusing global reductions in high precision is sufficient.

This architecture integrates readily into MPI+OpenMP or CUDA libraries, yielding EFBS-certified sparse solvers suitable for large-scale, mixed-precision HPC deployments (Jiao, 2023).

7. Theoretical and Practical Significance

Bucketed SPAI-GMRES-IR enables new forms of adaptive-precision preconditioning with provable error guarantees matching those of standard uniform-precision methods, contingent on regime-appropriate parameter selection. Empirical and theoretical analyses indicate that, under well-posedness and with proper bucket thresholding, the method is robust to the degradation often associated with low-precision arithmetic, facilitating energy and cost savings without sacrificing solution quality. A plausible implication is that further hardware trends toward mixed-precision support will amplify these gains for large-scale, sparse scientific computing (Khan et al., 2023, Jiao, 2023).

Markdown Report Issue Upgrade to Chat

References (2)

Mixed Precision Iterative Refinement with Adaptive Precision Sparse Approximate Inverse Preconditioning (2023)

Optimal Solutions of Well-Posed Linear Systems via Low-Precision Right-Preconditioned GMRES with Forward and Backward Stabilization (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bucketed SPAI-GMRES-IR.

Bucketed SPAI-GMRES-IR: Mixed-Precision Solver

1. Construction of the Bucketed SPAI Preconditioner

2. Mixed-Precision GMRES-IR Framework with Bucketed SPAI

3. Convergence Guarantees and Stability Analysis

4. Computational Cost, Storage, and Precision Allocation

5. Numerical Results and Empirical Trade-offs

6. Implementation and Practical Recommendations

7. Theoretical and Practical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Bucketed SPAI-GMRES-IR: Mixed-Precision Solver

1. Construction of the Bucketed SPAI Preconditioner

2. Mixed-Precision GMRES-IR Framework with Bucketed SPAI

3. Convergence Guarantees and Stability Analysis

4. Computational Cost, Storage, and Precision Allocation

5. Numerical Results and Empirical Trade-offs

6. Implementation and Practical Recommendations

7. Theoretical and Practical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research