Mixed-Precision Iterative Refinement
- Mixed-precision iterative refinement is a numerical algorithm that employs various floating-point precisions for factorization, vector updates, and residual evaluations to achieve high accuracy.
- It strategically combines low-precision arithmetic for rapid computations with high-precision corrections to ensure numerical stability in methods like LU decomposition and GMRES.
- The approach is applied to dense, sparse, and structured systems, leveraging specialized hardware such as tensor cores for significant performance gains.
Mixed-precision iterative refinement (MP-IR) is a family of numerical algorithms designed to efficiently and accurately solve linear algebraic problems such as linear systems, least-squares, matrix equations, and related computational kernels. By orchestrating workflows across multiple floating-point precisions, MP-IR leverages the high throughput of low precision arithmetic while retaining the numerical stability and accuracy of high precision operations. The paradigm is supported by a rigorous convergence and error analysis, and underpins recent advances in high-performance and energy-efficient numerical linear algebra on contemporary hardware.
1. Algorithmic Principles and Iterative Refinement Architecture
MP-IR schemes iteratively refine a solution using a mixture of floating-point precisions, exploiting the asymmetry between computational costs and rounding error characteristics among IEEE formats (e.g., binary16/32/64, commonly known as fp16/fp32/fp64). The central workflow consists of four precision roles:
- Factorization precision (): For initial decompositions (e.g., LU, QR), performed in the lowest viable precision without causing divergence.
- Working precision (): For vector updates and, in some cases, arithmetic inside Krylov subspace methods or direct solvers.
- Residual evaluation precision (): For residual computation, generally matching the target precision for the desired accuracy.
- Effective solve precision (): Dictates the minimum precision to achieve the required correction step, and may depend on the solver being used (e.g., direct triangular solves versus flexible GMRES) (Oktay et al., 2021).
The archetypical iteration (for ) is:
$r^{(k)} = b - A x^{(k)} \quad\text{(in $u_r$)}, \qquad \text{Solve}\ A d^{(k)} = r^{(k)}\ \text{(in $u_s$)}, \qquad x^{(k+1)} = x^{(k)} + d^{(k)}\ \text{(in $u$)},$
with the factorization precomputed in . In Krylov subspace variants, is obtained from a preconditioned GMRES or flexible GMRES solver, often using the factors computed in as preconditioners.
2. Convergence Constraints and Error Estimates
The convergence of MP-IR is controlled by the conditioning of the underlying problem and the precision in which the factorization and iterative solves are performed. Classical results show:
- Standard IR (SIR): Converges if 0, reaching working-precision backward error 1.
- GMRES-based IR (GMRES-IR): Extends the solvable regime to 2 when GMRES matvecs are evaluated in higher-than-working precision; the bound tightens to 3 if all Krylov operations are in 4 (Oktay et al., 2021).
- Switching and Stagnation: Monitoring quantities such as 5 and 6 allows automated detection of slow convergence or stagnation, triggering either a solver upgrade (e.g., from SIR to GMRES-IR) or an increase in computational precision.
The error analysis confirms that forward and backward accuracy at the level of the target (highest) precision is attainable, provided these constraints are observed and the number of refinement steps is moderate.
3. Multistage and Adaptive Refinement Strategies
The multistage MP-IR (MSIR) approach orchestrates a progression through increasingly robust (but more costly) refinement stages, balancing performance and reliability (Oktay et al., 2021):
- Stage 1: SIR (LU-based, low precision solves)
- Stage 2: SGMRES-IR (all GMRES work in working precision)
- Stage 3: GMRES-IR (GMRES/SpMV in squared working precision)
- Stage 4: Precision promotion and refactorization
This "stronger-solver-first" philosophy upgrades the solver before refactorization, minimizing high-precision factorization cost. Switching is governed by convergence of error monitors and limits on the number of iterations at each stage. The cost model and numerical evidence confirm that for easy problems (well-conditioned 7), MSIR requires only the cheapest stage; for ill-conditioned 8, the full pipeline enables convergence where single-stage methods would fail or require excessive resources.
| Stage | Solve Method | Required Condition Number Bound |
|---|---|---|
| SIR | Triangular solve in 9 | 0 |
| SGMRES-IR | GMRES all in 1 | 2 |
| GMRES-IR | GMRES SpMV in 3 | 4 |
MSIR achieves minimal total high-precision work, often requiring few, if any, high-cost refactorizations and leveraging the adaptive "stage escalation" only as dictated by empirical convergence (Oktay et al., 2021).
4. Algorithmic Variants and Domain-Specific Extensions
MP-IR underlies a diverse range of algorithms beyond standard dense linear system solvers:
- Sparse and Structured Systems: The MP-IR paradigm has been adapted to algorithms exploiting sparse matrix structure, including preconditioned GMRES with sparse approximate inverse (SPAI) or incomplete LU preconditioning (Carson et al., 2022, Khan et al., 2023). Adaptive-precision preconditioner application can reduce storage and execution time, with convergence maintained when the preconditioned matrix remains well-conditioned.
- Iterative Refinement for Matrix Equations: Mixed-precision IR methods have been developed for Lyapunov and Sylvester equations, employing stationary iterative schemes and low-precision Schur decompositions, followed by refinement steps in higher precision. Convergence is guaranteed provided the key operators’ condition numbers are below the reciprocal of the solver precision (Dmytryshyn et al., 5 Mar 2025, Benner et al., 2 Oct 2025).
- Least Squares and Inverse Problems: MP-IR has been applied to both standard and generalized least-squares problems, with careful augmentation and preconditioning leading to accurate and robust solutions over a broad conditioning regime (Gao et al., 2024, Carson et al., 2024, Carson et al., 2024). In Tikhonov-regularized settings, the refinement sequence can be interpreted through a filter-factor formalism (Nagy et al., 2024).
- Precision-Boosted Hybrid Algorithms: In optimization, MP-IR can be combined with precision boosting, where iterative refinement in the base precision is complemented by promotion to higher precision upon failure, ensuring both robustness and efficiency for linear programming (Eifler et al., 2023).
- Alternative Number Systems: Posit-based IR exhibits comparable or superior convergence relative to IEEE fp16 in the presence of equilibration, illustrating extensibility beyond standard IEEE formats (Quinlan et al., 2024).
5. Hardware and Software Implementation Considerations
MP-IR is particularly effective on modern architectures equipped with specialized low-precision hardware (tensor cores, vector engines, or SIMD units). Key implementation observations include:
- Low-precision accelerates dense matrix-matrix operations, with fp16/fp32 delivering 5–6 speedups versus fp64, depending on device and operation (Dongarra et al., 23 Sep 2025, Oktay et al., 2021).
- SIMD acceleration for multi-component multiple-precision (double-double, quad-double) arithmetic enables orders-of-magnitude improvements in direct methods, yielding performance advantages in MP-IR frameworks (Kouya, 2021).
- On problem classes suitable for mixed-precision IR (e.g., 7 below the theoretical bound), double-precision backward error is achieved with minimal high-precision work, with speedups of 8–9 over all-high-precision alternatives (Gao et al., 2022, Ge et al., 8 Jan 2025).
Algorithmic frameworks have been ported to high-level languages (Python, via C wrappers; native C/C++/Fortran) and integrated into scientific software ecosystems, leveraging standard BLAS/LAPACK routines as well as high-performance custom kernels.
6. Practical Guidance, Limitations, and Research Directions
A suite of practical rules emerges from the collective research corpus:
- Choose the lowest factorization precision such that 0, escalating via solver upgrades or refactorization only if convergence criteria fail (Oktay et al., 2021).
- Incorporate diagonal scaling and equilibration prior to low-precision conversion to mitigate over/underflow and improve effective conditioning (Dongarra et al., 23 Sep 2025, Quinlan et al., 2024).
- For indefinite, ill-conditioned, or highly structured problems, multistage or five-precision frameworks, as well as domain-specific refinement strategies, may be requisite.
- Numerical experiments repeatedly demonstrate that adaptive multistage MP-IR attains working-precision backward error on hard instances at a total high-precision computational cost comparable to the cheapest scheme on easy problems.
Limitations remain, most prominently concerning extremely ill-conditioned problems that exceed the convergence envelope, or domains where factorization in any practical low precision is unreliable. The need for careful parameter tuning, such as convergence thresholds and inner solver tolerances, also persists.
Ongoing research directions include generalizing MP-IR frameworks to broader matrix classes (nonlinear and eigenvalue problems), automating parameter selection (e.g., 1 in parameter-regularized schemes via machine learning (Ge et al., 8 Jan 2025)), and hardware-adaptive resource scheduling.
References
- Oktay, E. & Carson, E. "Multistage Mixed Precision Iterative Refinement" (Oktay et al., 2021)
- Buttari, A. et al., "Mixed precision iterative refinement for dense linear systems", IJHPCA 21(4):457–466, 2007.
- Kouya, T., "Accelerated Multiple Precision Direct Method and Mixed Precision Iterative Refinement on Python Programming Environment" (Kouya, 2021)
- HPL-MxP Benchmark (Dongarra et al., 23 Sep 2025)
- Iterative Refinement with Low-Precision Posits (Quinlan et al., 2024)
- Mixed-precision iterative refinement for low-rank Lyapunov equations (Benner et al., 2 Oct 2025)
- Iterative Refinement of Schur decompositions (Bujanović et al., 2022)
- Combining Precision Boosting with LP Iterative Refinement for Exact Linear Optimization (Eifler et al., 2023)
- Mixed precision iterative refinement for linear inverse problems (Nagy et al., 2024)
- Mixed-precision algorithms for solving the Sylvester matrix equation (Dmytryshyn et al., 5 Mar 2025)
- Mixed Precision GMRES-based Iterative Refinement with Recycling (Oktay et al., 2022)
- A Study of Mixed Precision Strategies for GMRES on GPUs (Loe et al., 2021)
- Three-precision iterative refinement with parameter regularization and prediction for solving large sparse linear systems (Ge et al., 8 Jan 2025)
- Mixed Precision FGMRES-Based Iterative Refinement for Weighted Least Squares (Carson et al., 2024)
- Mixed precision sketching for least-squares problems and its application in GMRES-based iterative refinement (Carson et al., 2024)
- Mixed precision iterative refinement for least squares with linear equality constraints and generalized least squares problems (Gao et al., 2024)
- Mixed Precision Iterative Refinement with Adaptive Precision Sparse Approximate Inverse Preconditioning (Khan et al., 2023)
- Mixed Precision Iterative Refinement with Sparse Approximate Inverse Preconditioning (Carson et al., 2022)
- A comparison of mixed precision iterative refinement approaches for least-squares problems (Carson et al., 2024)