Papers
Topics
Authors
Recent
Search
2000 character limit reached

Classical Iterative Refinement in Mixed Precision

Updated 5 May 2026
  • Classical iterative refinement is a numerical method that iteratively improves solutions of Ax=b by computing high-precision residuals and low-precision corrections.
  • It leverages a mixed-precision framework where a precomputed LU factorization in low precision is paired with high-precision residuals to balance speed and accuracy.
  • The method achieves linear convergence under proper conditioning and precision choices, ensuring stability when uₛκ(A) is kept below 1.

Classical iterative refinement (IR) is a numerical scheme for solving linear systems of the form Ax=bA x = b that incrementally improves an approximate solution by leveraging corrections computed from the current residual. Originally developed as a post-processing step to enhance the accuracy of Gaussian elimination, IR is now central in mixed-precision and low-precision numerical linear algebra. Its primary mechanism is to alternate between low-precision solves—typically using a precomputed LU factorization—and high-precision residual computation, thus combining the speed of low-precision arithmetic with the accuracy of high-precision feedback.

1. Algorithmic Structure and Precision Model

Given a nonsingular ARn×nA\in\mathbb{R}^{n\times n} and bRnb\in\mathbb{R}^n, classical IR generates a sequence {xk}k0\{x_k\}_{k\ge0} converging to the solution x=A1bx^*=A^{-1}b. The basic steps at iteration kk are:

  1. High-Precision Residual: rk=bAxkr_k = b - A x_k in precision uu (typically chosen so the residual can reveal errors undetectable at factorization precision) (Wu et al., 2023, Kelley, 2024).
  2. Low-Precision Correction: Solve Adk=rkA d_k = r_k using the precomputed LU factorization in solver precision usu_s. The correction solve can be carried out in different styles—on-the-fly (promotion of LU factors), in-place (demotion of residual), or direct low-precision, each with distinct interprecision transfer costs (Kelley, 2024).
  3. Solution Update: ARn×nA\in\mathbb{R}^{n\times n}0 in high precision.

IR can be embedded in a multi-precision framework with explicit storage, factorization, residual, and solver precisions: working (ARn×nA\in\mathbb{R}^{n\times n}1), factorization (ARn×nA\in\mathbb{R}^{n\times n}2), residual (ARn×nA\in\mathbb{R}^{n\times n}3), and solver (ARn×nA\in\mathbb{R}^{n\times n}4) precisions, with explicit interprecision transfer operators ARn×nA\in\mathbb{R}^{n\times n}5 (Kelley, 2024).

2. Theoretical Properties and Convergence Analysis

The convergence and stability of classical IR are governed by the interplay of matrix conditioning and the floating-point precisions used. Critical results include:

  • Iteration Error Propagation: For error ARn×nA\in\mathbb{R}^{n\times n}6, the update can be written as ARn×nA\in\mathbb{R}^{n\times n}7, simplifying to ARn×nA\in\mathbb{R}^{n\times n}8 upon proper manipulation, with ARn×nA\in\mathbb{R}^{n\times n}9 (Wu et al., 2023).
  • Linear Convergence Criteria:

bRnb\in\mathbb{R}^n0

With refined analysis (Wu et al., 2023):

bRnb\in\mathbb{R}^n1

IR converges linearly provided bRnb\in\mathbb{R}^n2 (practically, bRnb\in\mathbb{R}^n3 is required).

  • Promotion Effect: When the residual and corrections are computed in higher precision bRnb\in\mathbb{R}^n4, IR solves the promoted system bRnb\in\mathbb{R}^n5, with bRnb\in\mathbb{R}^n6, bRnb\in\mathbb{R}^n7; thus, the limiting solution is bRnb\in\mathbb{R}^n8 (Kelley, 2024). The forward error at termination is bounded by bRnb\in\mathbb{R}^n9 for a residual threshold {xk}k0\{x_k\}_{k\ge0}0.
  • Finite-Precision Limiting Accuracy: The attainable forward error is {xk}k0\{x_k\}_{k\ge0}1 if the {xk}k0\{x_k\}_{k\ge0}2 factorization in {xk}k0\{x_k\}_{k\ge0}3 is backward stable and the residual computation in {xk}k0\{x_k\}_{k\ge0}4 is accurate (Kelley, 2024).

3. Error Sources and Stability

The forward and backward error analyses in IR account for rounding in several locations:

  • Factorization Error ({xk}k0\{x_k\}_{k\ge0}5): The {xk}k0\{x_k\}_{k\ge0}6 factors {xk}k0\{x_k\}_{k\ge0}7 satisfy {xk}k0\{x_k\}_{k\ge0}8. This restricts {xk}k0\{x_k\}_{k\ge0}9 so that x=A1bx^*=A^{-1}b0 is necessary for backward stability (Kelley, 2024, Kelley, 2023).
  • Residual Error (x=A1bx^*=A^{-1}b1): Residual computation incurs x=A1bx^*=A^{-1}b2 error.
  • Triangular Solves (x=A1bx^*=A^{-1}b3): Affect the correction step and introduce trade-offs between x=A1bx^*=A^{-1}b4 upcasts (on-the-fly) versus x=A1bx^*=A^{-1}b5 transfers (in-place).

A crucial consequence is that IR cannot "repair" catastrophic instabilities from an unstable factorization—its effectiveness is contingent on having at least modestly stable x=A1bx^*=A^{-1}b6 factors and moderately conditioned x=A1bx^*=A^{-1}b7 (Wu et al., 2023, Kelley, 2023).

4. Precision Choices and Interprecision Transfers

Selecting optimal precisions for x=A1bx^*=A^{-1}b8, x=A1bx^*=A^{-1}b9, and kk0 depends on hardware, desired accuracy, and matrix conditioning:

  • Working/Residual Precision (kk1, kk2): kk3 should be much finer than kk4; ideally kk5 to guarantee a low error floor (Kelley, 2024).
  • Factorization Precision (kk6): Use the minimal precision such that kk7; half precision is restricted to very well-conditioned kk8, while single precision suffices for moderate kk9 (Kelley, 2024, Kelley, 2023).
  • Solver Precision (rk=bAxkr_k = b - A x_k0): On-the-fly (all promoted to rk=bAxkr_k = b - A x_k1) is robust for ill-conditioned or very low rk=bAxkr_k = b - A x_k2; in-place (working in rk=bAxkr_k = b - A x_k3) is efficient if rk=bAxkr_k = b - A x_k4.

A synthesis of these strategies enables balancing computation speed and solution accuracy, especially in modern hardware contexts (e.g., GPU, analog accelerators).

5. Practical Implementation and Termination

Implementation of classical IR in practical software reflects these principles. In MultiPrecisionArrays.jl, for instance, the "double–single" scheme—storing rk=bAxkr_k = b - A x_k5 and rk=bAxkr_k = b - A x_k6 in Float64, factorizing in Float32—enables fast factorization while correcting via high-precision residuals (Kelley, 2023). The loop continues until either

  • the residual norm is below a multiple of rk=bAxkr_k = b - A x_k7,
  • the backward error is sufficiently small,
  • or stagnation is detected, typically via a relative decrease threshold on rk=bAxkr_k = b - A x_k8 (Kelley, 2023).

Mixed-precision IR delivers nearly full working-precision accuracy at lower cost than a pure high-precision direct solve, provided that rk=bAxkr_k = b - A x_k9 is not too ill-conditioned.

6. Extensions, Failure Modes, and Optimality

Classical IR (Wilkinson IR, uu0) in working precision admits generalization to relaxed forms, IR(uu1), but uu2 is universally optimal for stability and contraction rate (Smoktunowicz et al., 2015). The method is typically limited by either divergence if the inner solve is unstable or the model error floor set by uu3. In highly ill-conditioned, low-precision, or stochastic environments, classical IR may fail, in which case stabilized (e.g., line-search) IR variants should be used (Wu et al., 2023).

Extensive numerical experiments demonstrate that one or a few steps of classical IR suffice to recover backward-stability in diverse scenarios, confirming its reliability and efficiency when used within its regime of validity (Smoktunowicz et al., 2015, Kelley, 2023).

7. Historical and Contemporary Impact

IR's renaissance is driven by its compatibility with heterogeneous hardware and mixed-precision computation (Wu et al., 2023). By combining the speed of low-precision arithmetic, storage, and factorization with the accuracy afforded by high-precision corrections, IR is foundational in software frameworks (e.g., Julia, MATLAB/BLAS/LAPACK) and is central in algorithmic innovations for emerging architectures.

Recent research clarifies the algorithm's behavior in promoted systems, makes explicit the role of interprecision transfer, and refines the theoretical understanding of attainable accuracy, guiding both the development and practical deployment of IR in scientific computing (Kelley, 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Classical Iterative Refinement (IR).