Papers
Topics
Authors
Recent
Search
2000 character limit reached

Quantized Random Rounding in Low-Precision Arithmetic

Updated 26 May 2026
  • Quantized random rounding is a stochastic method that maps continuous values to discrete grids, ensuring unbiased expectation preservation.
  • It achieves superior error scaling of O(u√n) compared to deterministic rounding, making it effective in numerical linear algebra and deep learning.
  • The approach leverages r-bit randomization in limited-precision settings for efficient, reliable computations on modern hardware.

Quantized random rounding, often referred to as stochastic rounding (SR) with explicit discretization of both the randomization and quantization grids, is a probabilistic rounding scheme that targets unbiasedness and superior error scaling for low-precision arithmetic, particularly in the context of machine learning, numerical linear algebra, and scientific computing. SR and its quantized variants are distinguished by their ability to deliver expectation-preserving quantization even at low bit-widths, facilitate rigorous error control, and bridge algorithmic and hardware requirements for efficient deployment.

1. Formal Definitions and Mathematical Foundations

In quantized random rounding, the real-valued input xx is mapped to a discrete quantization grid F\mathbb{F} (floating- or fixed-point). For uniform quantization step Δ>0\Delta>0, let xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor and xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil. The canonical stochastic rounding operator selects either the lower or upper neighbor with probability proportional to their proximity: SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases} so that E[SRΔ(x)]=x\mathbb{E}[\operatorname{SR}_\Delta(x)] = x for xRx\in\mathbb{R} (Liu et al., 2 Nov 2025, Arar et al., 6 Mar 2026).

Limited-precision or quantized SR, denoted SRp(r)SR_p^{(r)}, further quantizes randomization: xx is first rounded to F\mathbb{F}0 bits, the discretized fractional part is extracted, and the random decision is made with F\mathbb{F}1 random bits to approximate the ideal probability. This results in a rounding operator

F\mathbb{F}2

where F\mathbb{F}3 acts on the F\mathbb{F}4-bit representation and the stochastic decision is made from a uniformly random integer F\mathbb{F}5 compared to the discretized threshold. For architectures, typical F\mathbb{F}6 values are F\mathbb{F}7–F\mathbb{F}8, balancing statistical fidelity and random number generation overhead (Arar et al., 6 Mar 2026).

2. Statistical Properties and Error Bounds

The key statistical property of quantized random rounding is expectation-preservation: F\mathbb{F}9 ensuring that rounding is unbiased in the mean. When used in summation and reduction kernels, the error Δ>0\Delta>00 at step Δ>0\Delta>01 forms a bounded martingale difference sequence, leading to Azuma–Hoeffding and Chebyshev-type high-probability bounds: Δ>0\Delta>02 where Δ>0\Delta>03 is the unit roundoff (Δ>0\Delta>04), Δ>0\Delta>05 bounds partial sum magnitudes, and Δ>0\Delta>06 is the accumulation length (Arar et al., 6 Mar 2026).

This Δ>0\Delta>07 scaling (as opposed to Δ>0\Delta>08 for round-to-nearest) is corroborated for arbitrary discrete grids and higher moments:

  • For a general quantizer Δ>0\Delta>09 and xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor0 as the stochastically rounded version, the xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor1-th moment deviation satisfies

xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor2

for a universal constant xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor3 depending on the quantization envelope and the probability law of xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor4 (Chen, 2020).

  • Variance and second-moment loss can be rigorously bounded using variance-informed probabilistic rounding models, yielding problem-size constants xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor5 scaling as xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor6 and empirically validated to outperform classical deterministic error growth by several orders of magnitude for large xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor7 (Bhola et al., 2024).

In practical scenarios with randomly quantized discretization and xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor8-bit precision in the stochastic decision, the bias is xΔ=Δx/Δ\lfloor x \rfloor_\Delta = \Delta \cdot \lfloor x/\Delta \rfloor9, negligible in high-xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil0 regimes, but degrading to standard biased rounding for very small xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil1 (Arar et al., 6 Mar 2026).

3. Theoretical Analysis in SGD and Optimization

Quantized random rounding is analytically advantageous in mini-batch SGD and low-precision neural training:

  • Variance Decomposition: In SR-quantized SGD, gradient estimator variance has two parts—sampling variance (xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil2) and quantization variance (xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil3). The total variance per entry decays as xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil4 with batch size xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil5:

xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil6

so larger batches linearly suppress quantization noise (Liu et al., 2 Nov 2025).

  • Unbiasedness and Convergence: SR maintains unbiasedness in the stochastic gradient, ensuring that SGD recovers standard convergence rates (for smooth, bounded-variance objectives). The deterministic round-to-nearest (RTN) is systematically biased, imparting an irreducible error term in SGD convergence, whereas the bias from SR vanishes as batch size increases (Liu et al., 2 Nov 2025).
  • Precision–Batch Trade-Off: Reducing mantissa bit-width xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil7 by xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil8 quadruples (xΔ=Δx/Δ\lceil x \rceil_\Delta = \Delta \cdot \lceil x/\Delta \rceil9) the quantization variance, which can be offset by increasing batch size by the same factor. This relationship is pivotal for low-precision training on resource-constrained devices.

Frameworks such as LOTION apply SR as a smoothing primitive to the quantized loss landscape, yielding a differentiable surrogate and preserving global minima of the original quantized objective. This allows rigorous application of stochastic optimization (SGD/Adam) without off-manifold bias, in contrast to Straight-Through Estimators, which are generally biased and lack convergence guarantees for non-convex, discontinuous objectives (Kwun et al., 9 Oct 2025).

4. Moment Preservation and Limiting Behavior

Quantized random rounding exhibits precise control over moment distortion:

  • First moment: Always preserved (unbiasedness).
  • Higher raw moments: The error in the SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}0-th moment is SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}1, and the absolute moment error is SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}2. The constants linking quantization error and higher moments may grow rapidly with SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}3, implying that deterministic rounding may be preferable where faithful preservation of variance, skewness, or kurtosis is critical (Chen, 2020).
  • Sheppard-type Corrections: For uniformly spaced quantization, stochastic rounding uniquely avoids the systematic bias (Sheppard correction) present in deterministic schemes; for grid, lattice, or block-form quantizers these corrections can be written explicitly (Janson, 9 Apr 2025).

Moment analysis extends to discrete random variables, where randomized rounding (probability matched to the fractional part) achieves exact expectation preservation and, for many practical cases, minimal MSE among rounding schemes.

5. Practical Implementation, Hardware, and Guidelines

Quantized random rounding is efficient to implement and widely adopted across hardware:

  • Random Bit Resource: For sequence lengths SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}4 (e.g., in reductions), SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}5 suffices for negligible bias. PRNGs such as LFSRs can produce random bits on-chip with very little area overhead; industry platforms typically set SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}6–SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}7 depending on the operation and performance target (Arar et al., 6 Mar 2026).
  • Placement in Kernels: Best practice is to employ SR when casting high-precision intermediate results to lower-precision outputs (e.g., during accumulator flushes or gradient write-back). For inference, deterministic modes may be preferred for reproducibility, while for training and accumulation, SR gives optimal error scaling (Arar et al., 6 Mar 2026).
  • Hardware Integration: Modern AI/ML accelerators (NVIDIA, AMD, Graphcore) provide hardware support for limited-precision random rounding, enabling aggressive quantization (INT4/INT8) without stagnation or bias-driven accuracy loss.
  • Microbatching: In edge training (LLMs and vision models), smaller mantissa can be compensated by increasing the microbatch size; this unlocks high compute density without sacrificing convergence or statistical fidelity (Liu et al., 2 Nov 2025).

6. Applications and Empirical Validation

Quantized random rounding is validated in diverse settings:

  • Deep Learning: SR enables stable low-precision training, especially at INT4/INT8 formats. Experiments confirm that with correct batch/precision scaling, accuracy typical of full-precision models is recoverable (e.g., BERT-GLUE/LLM finetuning) (Liu et al., 2 Nov 2025).
  • Numerical Linear Algebra: Summation, dot-product, matrix-vector multiplication, and LU solutions using SR exhibit rounding error growth SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}8, several orders smaller than classical error bounds. Calibration on hardware matches predicted constants by variance-driven bounds (Bhola et al., 2024).
  • Scientific Modeling: Climate simulations and long-time evolution tasks maintain statistical properties when using SR, as opposed to loss of fidelity with deterministic round-to-nearest; this is crucial for modeling reliability in predictive and chaotic systems (Arar et al., 6 Mar 2026).

Empirical validation extends to statistical kernels, PDE solvers, and signal processing, with moment error always consistent with theoretical bounds, and in large-scale reductions SR outperforms all deterministic modes in bias and variance (Janson, 9 Apr 2025, Chen, 2020).

7. Comparison to Alternative Rounding Schemes

A comparative summary:

Rounding mode Unbiased? Error growth Hardware cost
Round-to-nearest No SRΔ(x)={xΔwith probability 1xxΔΔ xΔwith probability xxΔΔ\operatorname{SR}_\Delta(x) = \begin{cases} \lfloor x \rfloor_\Delta \quad & \text{with probability } 1 - \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \ \lceil x \rceil_\Delta & \text{with probability } \frac{x - \lfloor x \rfloor_\Delta}{\Delta} \end{cases}9 None
Stochastic (quantized) Yes (E[SRΔ(x)]=x\mathbb{E}[\operatorname{SR}_\Delta(x)] = x0 large) E[SRΔ(x)]=x\mathbb{E}[\operatorname{SR}_\Delta(x)] = x1 E[SRΔ(x)]=x\mathbb{E}[\operatorname{SR}_\Delta(x)] = x2-bit PRNG, LFSR
Floor/Ceil No; one-sided E[SRΔ(x)]=x\mathbb{E}[\operatorname{SR}_\Delta(x)] = x3, biased None
Data-dependent (DiscQuant) Yes (on calibration set) E[SRΔ(x)]=x\mathbb{E}[\operatorname{SR}_\Delta(x)] = x4, low discrepancy PRNG, calibration
Classical randomized Yes (coordinate-wise) E[SRΔ(x)]=x\mathbb{E}[\operatorname{SR}_\Delta(x)] = x5 PRNG

Quantized random rounding maintains bias-free summation properties; deterministic alternatives accumulate systematic error (bias), degrade for deep reductions, and may stagnate when adding sub-threshold summands (causing loss of signal in neural gradients and physical simulations) (Arar et al., 6 Mar 2026, Liu et al., 2 Nov 2025).

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Quantized Random Rounding.