Classical Iterative Refinement in Mixed Precision

Updated 5 May 2026

Classical iterative refinement is a numerical method that iteratively improves solutions of Ax=b by computing high-precision residuals and low-precision corrections.
It leverages a mixed-precision framework where a precomputed LU factorization in low precision is paired with high-precision residuals to balance speed and accuracy.
The method achieves linear convergence under proper conditioning and precision choices, ensuring stability when uₛκ(A) is kept below 1.

Classical iterative refinement (IR) is a numerical scheme for solving linear systems of the form $A x = b$ that incrementally improves an approximate solution by leveraging corrections computed from the current residual. Originally developed as a post-processing step to enhance the accuracy of Gaussian elimination, IR is now central in mixed-precision and low-precision numerical linear algebra. Its primary mechanism is to alternate between low-precision solves—typically using a precomputed LU factorization—and high-precision residual computation, thus combining the speed of low-precision arithmetic with the accuracy of high-precision feedback.

1. Algorithmic Structure and Precision Model

Given a nonsingular $A\in\mathbb{R}^{n\times n}$ and $b\in\mathbb{R}^n$ , classical IR generates a sequence $\{x_k\}_{k\ge0}$ converging to the solution $x^*=A^{-1}b$ . The basic steps at iteration $k$ are:

High-Precision Residual: $r_k = b - A x_k$ in precision $u$ (typically chosen so the residual can reveal errors undetectable at factorization precision) (Wu et al., 2023, Kelley, 2024).
Low-Precision Correction: Solve $A d_k = r_k$ using the precomputed LU factorization in solver precision $u_s$ . The correction solve can be carried out in different styles—on-the-fly (promotion of LU factors), in-place (demotion of residual), or direct low-precision, each with distinct interprecision transfer costs (Kelley, 2024).
Solution Update: $A\in\mathbb{R}^{n\times n}$ 0 in high precision.

IR can be embedded in a multi-precision framework with explicit storage, factorization, residual, and solver precisions: working ( $A\in\mathbb{R}^{n\times n}$ 1), factorization ( $A\in\mathbb{R}^{n\times n}$ 2), residual ( $A\in\mathbb{R}^{n\times n}$ 3), and solver ( $A\in\mathbb{R}^{n\times n}$ 4) precisions, with explicit interprecision transfer operators $A\in\mathbb{R}^{n\times n}$ 5 (Kelley, 2024).

2. Theoretical Properties and Convergence Analysis

The convergence and stability of classical IR are governed by the interplay of matrix conditioning and the floating-point precisions used. Critical results include:

Iteration Error Propagation: For error $A\in\mathbb{R}^{n\times n}$ 6, the update can be written as $A\in\mathbb{R}^{n\times n}$ 7, simplifying to $A\in\mathbb{R}^{n\times n}$ 8 upon proper manipulation, with $A\in\mathbb{R}^{n\times n}$ 9 (Wu et al., 2023).
Linear Convergence Criteria:

$b\in\mathbb{R}^n$ 0

With refined analysis (Wu et al., 2023):

$b\in\mathbb{R}^n$ 1

IR converges linearly provided $b\in\mathbb{R}^n$ 2 (practically, $b\in\mathbb{R}^n$ 3 is required).

Promotion Effect: When the residual and corrections are computed in higher precision $b\in\mathbb{R}^n$ 4, IR solves the promoted system $b\in\mathbb{R}^n$ 5, with $b\in\mathbb{R}^n$ 6, $b\in\mathbb{R}^n$ 7; thus, the limiting solution is $b\in\mathbb{R}^n$ 8 (Kelley, 2024). The forward error at termination is bounded by $b\in\mathbb{R}^n$ 9 for a residual threshold $\{x_k\}_{k\ge0}$ 0.
Finite-Precision Limiting Accuracy: The attainable forward error is $\{x_k\}_{k\ge0}$ 1 if the $\{x_k\}_{k\ge0}$ 2 factorization in $\{x_k\}_{k\ge0}$ 3 is backward stable and the residual computation in $\{x_k\}_{k\ge0}$ 4 is accurate (Kelley, 2024).

3. Error Sources and Stability

The forward and backward error analyses in IR account for rounding in several locations:

Factorization Error ( $\{x_k\}_{k\ge0}$ 5): The $\{x_k\}_{k\ge0}$ 6 factors $\{x_k\}_{k\ge0}$ 7 satisfy $\{x_k\}_{k\ge0}$ 8. This restricts $\{x_k\}_{k\ge0}$ 9 so that $x^*=A^{-1}b$ 0 is necessary for backward stability (Kelley, 2024, Kelley, 2023).
Residual Error ( $x^*=A^{-1}b$ 1): Residual computation incurs $x^*=A^{-1}b$ 2 error.
Triangular Solves ( $x^*=A^{-1}b$ 3): Affect the correction step and introduce trade-offs between $x^*=A^{-1}b$ 4 upcasts (on-the-fly) versus $x^*=A^{-1}b$ 5 transfers (in-place).

A crucial consequence is that IR cannot "repair" catastrophic instabilities from an unstable factorization—its effectiveness is contingent on having at least modestly stable $x^*=A^{-1}b$ 6 factors and moderately conditioned $x^*=A^{-1}b$ 7 (Wu et al., 2023, Kelley, 2023).

4. Precision Choices and Interprecision Transfers

Selecting optimal precisions for $x^*=A^{-1}b$ 8, $x^*=A^{-1}b$ 9, and $k$ 0 depends on hardware, desired accuracy, and matrix conditioning:

Working/Residual Precision ( $k$ 1, $k$ 2): $k$ 3 should be much finer than $k$ 4; ideally $k$ 5 to guarantee a low error floor (Kelley, 2024).
Factorization Precision ( $k$ 6): Use the minimal precision such that $k$ 7; half precision is restricted to very well-conditioned $k$ 8, while single precision suffices for moderate $k$ 9 (Kelley, 2024, Kelley, 2023).
Solver Precision ( $r_k = b - A x_k$ 0): On-the-fly (all promoted to $r_k = b - A x_k$ 1) is robust for ill-conditioned or very low $r_k = b - A x_k$ 2; in-place (working in $r_k = b - A x_k$ 3) is efficient if $r_k = b - A x_k$ 4.

A synthesis of these strategies enables balancing computation speed and solution accuracy, especially in modern hardware contexts (e.g., GPU, analog accelerators).

5. Practical Implementation and Termination

Implementation of classical IR in practical software reflects these principles. In MultiPrecisionArrays.jl, for instance, the "double–single" scheme—storing $r_k = b - A x_k$ 5 and $r_k = b - A x_k$ 6 in Float64, factorizing in Float32—enables fast factorization while correcting via high-precision residuals (Kelley, 2023). The loop continues until either

the residual norm is below a multiple of $r_k = b - A x_k$ 7,
the backward error is sufficiently small,
or stagnation is detected, typically via a relative decrease threshold on $r_k = b - A x_k$ 8 (Kelley, 2023).

Mixed-precision IR delivers nearly full working-precision accuracy at lower cost than a pure high-precision direct solve, provided that $r_k = b - A x_k$ 9 is not too ill-conditioned.

6. Extensions, Failure Modes, and Optimality

Classical IR (Wilkinson IR, $u$ 0) in working precision admits generalization to relaxed forms, IR( $u$ 1), but $u$ 2 is universally optimal for stability and contraction rate (Smoktunowicz et al., 2015). The method is typically limited by either divergence if the inner solve is unstable or the model error floor set by $u$ 3. In highly ill-conditioned, low-precision, or stochastic environments, classical IR may fail, in which case stabilized (e.g., line-search) IR variants should be used (Wu et al., 2023).

Extensive numerical experiments demonstrate that one or a few steps of classical IR suffice to recover backward-stability in diverse scenarios, confirming its reliability and efficiency when used within its regime of validity (Smoktunowicz et al., 2015, Kelley, 2023).

7. Historical and Contemporary Impact

IR's renaissance is driven by its compatibility with heterogeneous hardware and mixed-precision computation (Wu et al., 2023). By combining the speed of low-precision arithmetic, storage, and factorization with the accuracy afforded by high-precision corrections, IR is foundational in software frameworks (e.g., Julia, MATLAB/BLAS/LAPACK) and is central in algorithmic innovations for emerging architectures.

Recent research clarifies the algorithm's behavior in promoted systems, makes explicit the role of interprecision transfer, and refines the theoretical understanding of attainable accuracy, guiding both the development and practical deployment of IR in scientific computing (Kelley, 2024).