Probabilistic Rounding Error Estimators

Updated 29 May 2026

Probabilistic rounding error estimators model floating-point rounding as independent, zero-mean random variables to derive high-confidence error bounds.
They use concentration inequalities and martingale techniques to achieve error bounds that scale as O(√n u), significantly improving over deterministic O(n u) estimates.
These methods guide mixed-precision algorithm design and resource allocation, demonstrating strong empirical performance in modern hardware-accelerated computing.

A probabilistic rounding error estimator is a methodology or explicit formula that quantifies, with specified probability, the accumulated numerical error introduced by rounding in floating-point arithmetic. Unlike deterministic bounds which constrain the error in the worst-case (valid for all possible input data and rounding sequences), probabilistic estimators make stochastic assumptions—e.g., that rounding errors are modeled as independent, bounded, mean-zero random variables—and use concentration inequalities to derive high-probability bounds. This approach often yields significantly tighter and more realistic error estimates, especially in large-scale or low-precision computations, and is now central to modern error analysis for mixed-precision and hardware-accelerated scientific computing (Bhola et al., 2024).

1. Mathematical Setting and Assumptions

Modern probabilistic rounding error analysis starts from the standard floating-point model: $\operatorname{fl}(z) = z(1 + \delta), \quad |\delta| \le u$ with unit roundoff $u$ . For compound operations, such as iterated summations or matrix products, the computed result can often be written as a product of the form: $\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ where each $\delta_i$ is (in the probabilistic analysis) an i.i.d. or mean-independent random variable, frequently taken to be uniform on $[-u, u]$ or with mean zero, $|\delta_i| \le u$ , and $\rho_i = \pm 1$ encodes the operation (e.g., multiply or divide, additive/subtractive effect).

Classical deterministic bounds (e.g., $\gamma_n = n u / (1 - n u)$ ) assume $\delta_i = \pm u$ in all cases ("worst-case analysis"). The probabilistic approach leverages the actual distribution and independence (or weak dependence) of the rounding errors to show, via concentration of measure, that such worst-case accumulations are highly unlikely.

2. Key Probabilistic Lemmas and Theorems

A foundational result is the variance-informed probabilistic error bound for products of elementary rounding errors [(Bhola et al., 2024), Lemma 2.2]: $\prod_{i=1}^n (1+\delta_i)^{\rho_i} = 1 + \tilde{\theta}_n^u$ with

$u$ 0

with probability at least

$u$ 1

for any $u$ 2, where $u$ 3 is an explicit function of $u$ 4 and $u$ 5; see (Bhola et al., 2024) for full details.

For basic operations (FMA, MPFMA), analogous formulas quantify the relative forward error in terms of the inputs' absolute values and the probabilistically defined $u$ 6 radii, with high confidence [(Bhola et al., 2024), Lemmas 3.4, 3.7].

The probabilistic bound scales as $u$ 7, in sharp contrast to the deterministic bound's $u$ 8. This is a consequence of modeling error accumulation as a random walk rather than a worst-case sum—a key insight justified rigorously by martingale techniques and concentration inequalities such as Azuma–Hoeffding or Bernstein bounds (Ipsen et al., 2019, Bhola et al., 2024, Bhola et al., 2024).

3. Derivation Methodology

The standard framework involves:

Random variable modeling: Treating rounding errors $u$ 9 as i.i.d. (or mean-independent) with $\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 0, $\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 1.
Error propagation: Expressing the total forward or backward error as a (multi-)linear function or product of these increments—e.g., for dot-products, polynomials, or matrix products:

$\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 2

Variance or martingale analysis: Bounding variance using explicit expressions for $\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 3 (Arar et al., 2022), or building a martingale whose increments reflect the propagation of random rounding, enabling Azuma–Hoeffding bounds (Castro et al., 2024, Ipsen et al., 2019).
Concentration inequality: Applying Chebyshev's inequality (variance-based, gives a bound proportional to $\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 4 for probability $\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 5) or Azuma–Hoeffding (yields $\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 6 bounds with probability $\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 7).

4. Comparison to Deterministic Error Bounds

The deterministic bound for $\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 8 chained operations is

$\prod_{i=1}^n (1+\delta_i)^{\rho_i}$ 9

growing linearly in $\delta_i$ 0.

The probabilistic estimator achieves

$\delta_i$ 1

with high probability, i.e., the bound grows like $\delta_i$ 2—tighter by a factor of $\delta_i$ 3 for large $\delta_i$ 4 (Bhola et al., 2024). For example, in matrix-matrix multiplication with $\delta_i$ 5, the probabilistic bound is nearly an order of magnitude tighter than the deterministic estimate.

Martingale-based estimators, universal across a broad class of multi-linear or sum-product numerical algorithms (e.g., sum, polynomial evaluation via Horner's method, Karatsuba multiplication), provide $\delta_i$ 6 bounds with explicit constants (Castro et al., 2024).

Variance-based Chebyshev estimators, as in (Arar et al., 2022), can improve logarithmic factors for moderate failure probabilities, yielding

$\delta_i$ 7

where $\delta_i$ 8 is an appropriate normwise condition number, and $\delta_i$ 9.

5. Application: Algorithm-Specific Estimators

Fused Multiply-Add (FMA) and Mixed-Precision FMA (MPFMA): Probabilistic estimators yield forward-error bounds refined by the probabilistic $[-u, u]$ 0 constants; see (Bhola et al., 2024) for explicit formulas and their dependence on the precisions used for operands and accumulators.

Tensor Core GEMM: For hardware-implemented matrix-matrix multiplication, probabilistic error estimators account for block partitioning and the number of accumulations per block, yielding tight, high-confidence forward-error bounds that are empirically nearly an order of magnitude below the deterministic guarantees (Bhola et al., 2024).

Vector and Matrix Sums/Products: Probabilistic estimators using martingale or variance-based techniques are applicable to classical inner products, summations (including pairwise strategies), and nonlinear kernels (e.g., variance computation), consistently demonstrating reduced $[-u, u]$ 1 or even $[-u, u]$ 2 scaling in the error with high probability (Arar et al., 2023, Ipsen et al., 2019).

Operation	Deterministic Bound	Probabilistic Bound	Typical Improvement
$[-u, u]$ 3-term summation	$[-u, u]$ 4	$[-u, u]$ 5	$[-u, u]$ 6 tighter
$[-u, u]$ 7-dot product	$[-u, u]$ 8	$[-u, u]$ 9	$\|\delta_i\| \le u$ 0 tighter
Matrix-matrix mult.	$\|\delta_i\| \le u$ 1, $\|\delta_i\| \le u$ 2accumulations	$\|\delta_i\| \le u$ 3	Up to $\|\delta_i\| \le u$ 4 lower
Pairwise sum/variance	$\|\delta_i\| \le u$ 5 (RN)	$\|\delta_i\| \le u$ 6 (SR)	$\|\delta_i\| \le u$ 7 tighter

6. Numerical Experiments and Practical Impact

Empirical results consistently support the theoretical claims: probabilistic estimators (even with high-confidence $|\delta_i| \le u$ 8) are often an order of magnitude tighter than deterministic bounds and sometimes even three to six orders of magnitude for very large, low-precision computations (Bhola et al., 2024, Bhola et al., 2024). Numerical experiments for matrix multiplications on NVIDIA Tensor cores found true forward error $|\delta_i| \le u$ 9, the deterministic bound $\rho_i = \pm 1$ 0, and the probabilistic bound $\rho_i = \pm 1$ 1 (Bhola et al., 2024).

Probabilistic estimators directly inform mixed-precision resource allocation: by quantifying rounding errors with tight confidence levels, users can determine the minimum required precision for each component of a large algorithm to meet overall error tolerances, preventing unnecessary use of expensive high-precision computations (Bhola et al., 2024).

7. Extensions and Limitations

Extensions include incorporation of higher-order statistics (variance-informed bounds), refined all-orders martingale analysis (avoiding truncation to leading-order behavior), consideration of limited-precision stochastic rounding (with explicit control of the number of random bits), and integration with symbolic analysis tools for programs with arbitrary input distributions and static dependencies (Tao et al., 5 May 2026, Constantinides et al., 2021).

Limitations of probabilistic estimators arise when the independence or zero-mean assumptions are violated (e.g., due to catastrophic cancellation, systematic bias, or adversarial input patterns), or when input data and rounding errors are strongly dependent. The general methodology remains robust for broad classes of algorithms where rounding error propagation can be cast in (essentially) mean-independent, zero-mean martingale frameworks (Ipsen et al., 2019, Castro et al., 2024, Bhola et al., 2024).

8. Summary

Probabilistic rounding error estimators have transformed the practice of floating-point error analysis by replacing pessimistic worst-case bounds with high-probability guarantees that scale as $\rho_i = \pm 1$ 2 rather than $\rho_i = \pm 1$ 3. Their theoretical foundation lies in treating rounding errors as bounded, zero-mean (and often independent or martingale-difference) random variables, enabling the use of sophisticated concentration inequalities to quantify error growth. These estimators are not only sharper for standard summations and dot-products but extend effectively to mixed-precision kernels (FMA, MPFMA, tensor-core GEMM), polynomial evaluation, variance computation, and beyond. Probabilistic estimators, now central in both research and practice, underpin precision-resource allocation, hardware/software co-design, and the robust deployment of large-scale scientific computing on modern accelerator architectures (Bhola et al., 2024, Guan et al., 2024, Bhola et al., 2024).