Quantized Random Rounding in Low-Precision Arithmetic
- Quantized random rounding is a stochastic method that maps continuous values to discrete grids, ensuring unbiased expectation preservation.
- It achieves superior error scaling of O(u√n) compared to deterministic rounding, making it effective in numerical linear algebra and deep learning.
- The approach leverages r-bit randomization in limited-precision settings for efficient, reliable computations on modern hardware.
Quantized random rounding, often referred to as stochastic rounding (SR) with explicit discretization of both the randomization and quantization grids, is a probabilistic rounding scheme that targets unbiasedness and superior error scaling for low-precision arithmetic, particularly in the context of machine learning, numerical linear algebra, and scientific computing. SR and its quantized variants are distinguished by their ability to deliver expectation-preserving quantization even at low bit-widths, facilitate rigorous error control, and bridge algorithmic and hardware requirements for efficient deployment.
1. Formal Definitions and Mathematical Foundations
In quantized random rounding, the real-valued input is mapped to a discrete quantization grid (floating- or fixed-point). For uniform quantization step , let and . The canonical stochastic rounding operator selects either the lower or upper neighbor with probability proportional to their proximity: so that for (Liu et al., 2 Nov 2025, Arar et al., 6 Mar 2026).
Limited-precision or quantized SR, denoted , further quantizes randomization: is first rounded to 0 bits, the discretized fractional part is extracted, and the random decision is made with 1 random bits to approximate the ideal probability. This results in a rounding operator
2
where 3 acts on the 4-bit representation and the stochastic decision is made from a uniformly random integer 5 compared to the discretized threshold. For architectures, typical 6 values are 7–8, balancing statistical fidelity and random number generation overhead (Arar et al., 6 Mar 2026).
2. Statistical Properties and Error Bounds
The key statistical property of quantized random rounding is expectation-preservation: 9 ensuring that rounding is unbiased in the mean. When used in summation and reduction kernels, the error 0 at step 1 forms a bounded martingale difference sequence, leading to Azuma–Hoeffding and Chebyshev-type high-probability bounds: 2 where 3 is the unit roundoff (4), 5 bounds partial sum magnitudes, and 6 is the accumulation length (Arar et al., 6 Mar 2026).
This 7 scaling (as opposed to 8 for round-to-nearest) is corroborated for arbitrary discrete grids and higher moments:
- For a general quantizer 9 and 0 as the stochastically rounded version, the 1-th moment deviation satisfies
2
for a universal constant 3 depending on the quantization envelope and the probability law of 4 (Chen, 2020).
- Variance and second-moment loss can be rigorously bounded using variance-informed probabilistic rounding models, yielding problem-size constants 5 scaling as 6 and empirically validated to outperform classical deterministic error growth by several orders of magnitude for large 7 (Bhola et al., 2024).
In practical scenarios with randomly quantized discretization and 8-bit precision in the stochastic decision, the bias is 9, negligible in high-0 regimes, but degrading to standard biased rounding for very small 1 (Arar et al., 6 Mar 2026).
3. Theoretical Analysis in SGD and Optimization
Quantized random rounding is analytically advantageous in mini-batch SGD and low-precision neural training:
- Variance Decomposition: In SR-quantized SGD, gradient estimator variance has two parts—sampling variance (2) and quantization variance (3). The total variance per entry decays as 4 with batch size 5:
6
so larger batches linearly suppress quantization noise (Liu et al., 2 Nov 2025).
- Unbiasedness and Convergence: SR maintains unbiasedness in the stochastic gradient, ensuring that SGD recovers standard convergence rates (for smooth, bounded-variance objectives). The deterministic round-to-nearest (RTN) is systematically biased, imparting an irreducible error term in SGD convergence, whereas the bias from SR vanishes as batch size increases (Liu et al., 2 Nov 2025).
- Precision–Batch Trade-Off: Reducing mantissa bit-width 7 by 8 quadruples (9) the quantization variance, which can be offset by increasing batch size by the same factor. This relationship is pivotal for low-precision training on resource-constrained devices.
Frameworks such as LOTION apply SR as a smoothing primitive to the quantized loss landscape, yielding a differentiable surrogate and preserving global minima of the original quantized objective. This allows rigorous application of stochastic optimization (SGD/Adam) without off-manifold bias, in contrast to Straight-Through Estimators, which are generally biased and lack convergence guarantees for non-convex, discontinuous objectives (Kwun et al., 9 Oct 2025).
4. Moment Preservation and Limiting Behavior
Quantized random rounding exhibits precise control over moment distortion:
- First moment: Always preserved (unbiasedness).
- Higher raw moments: The error in the 0-th moment is 1, and the absolute moment error is 2. The constants linking quantization error and higher moments may grow rapidly with 3, implying that deterministic rounding may be preferable where faithful preservation of variance, skewness, or kurtosis is critical (Chen, 2020).
- Sheppard-type Corrections: For uniformly spaced quantization, stochastic rounding uniquely avoids the systematic bias (Sheppard correction) present in deterministic schemes; for grid, lattice, or block-form quantizers these corrections can be written explicitly (Janson, 9 Apr 2025).
Moment analysis extends to discrete random variables, where randomized rounding (probability matched to the fractional part) achieves exact expectation preservation and, for many practical cases, minimal MSE among rounding schemes.
5. Practical Implementation, Hardware, and Guidelines
Quantized random rounding is efficient to implement and widely adopted across hardware:
- Random Bit Resource: For sequence lengths 4 (e.g., in reductions), 5 suffices for negligible bias. PRNGs such as LFSRs can produce random bits on-chip with very little area overhead; industry platforms typically set 6–7 depending on the operation and performance target (Arar et al., 6 Mar 2026).
- Placement in Kernels: Best practice is to employ SR when casting high-precision intermediate results to lower-precision outputs (e.g., during accumulator flushes or gradient write-back). For inference, deterministic modes may be preferred for reproducibility, while for training and accumulation, SR gives optimal error scaling (Arar et al., 6 Mar 2026).
- Hardware Integration: Modern AI/ML accelerators (NVIDIA, AMD, Graphcore) provide hardware support for limited-precision random rounding, enabling aggressive quantization (INT4/INT8) without stagnation or bias-driven accuracy loss.
- Microbatching: In edge training (LLMs and vision models), smaller mantissa can be compensated by increasing the microbatch size; this unlocks high compute density without sacrificing convergence or statistical fidelity (Liu et al., 2 Nov 2025).
6. Applications and Empirical Validation
Quantized random rounding is validated in diverse settings:
- Deep Learning: SR enables stable low-precision training, especially at INT4/INT8 formats. Experiments confirm that with correct batch/precision scaling, accuracy typical of full-precision models is recoverable (e.g., BERT-GLUE/LLM finetuning) (Liu et al., 2 Nov 2025).
- Numerical Linear Algebra: Summation, dot-product, matrix-vector multiplication, and LU solutions using SR exhibit rounding error growth 8, several orders smaller than classical error bounds. Calibration on hardware matches predicted constants by variance-driven bounds (Bhola et al., 2024).
- Scientific Modeling: Climate simulations and long-time evolution tasks maintain statistical properties when using SR, as opposed to loss of fidelity with deterministic round-to-nearest; this is crucial for modeling reliability in predictive and chaotic systems (Arar et al., 6 Mar 2026).
Empirical validation extends to statistical kernels, PDE solvers, and signal processing, with moment error always consistent with theoretical bounds, and in large-scale reductions SR outperforms all deterministic modes in bias and variance (Janson, 9 Apr 2025, Chen, 2020).
7. Comparison to Alternative Rounding Schemes
A comparative summary:
| Rounding mode | Unbiased? | Error growth | Hardware cost |
|---|---|---|---|
| Round-to-nearest | No | 9 | None |
| Stochastic (quantized) | Yes (0 large) | 1 | 2-bit PRNG, LFSR |
| Floor/Ceil | No; one-sided | 3, biased | None |
| Data-dependent (DiscQuant) | Yes (on calibration set) | 4, low discrepancy | PRNG, calibration |
| Classical randomized | Yes (coordinate-wise) | 5 | PRNG |
Quantized random rounding maintains bias-free summation properties; deterministic alternatives accumulate systematic error (bias), degrade for deep reductions, and may stagnate when adding sub-threshold summands (causing loss of signal in neural gradients and physical simulations) (Arar et al., 6 Mar 2026, Liu et al., 2 Nov 2025).
References
- "Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding" (Liu et al., 2 Nov 2025)
- "Limited-Precision Stochastic Rounding" (Arar et al., 6 Mar 2026)
- "Non-asymptotic moment bounds for random variables rounded to non-uniformly spaced sets" (Chen, 2020)
- "Exploiting Higher-Order Statistics for Robust Probabilistic Rounding Error Analysis" (Bhola et al., 2024)
- "Rounding of discrete variables" (Janson, 9 Apr 2025)
- "LOTION: Smoothing the Optimization Landscape for Quantized Training" (Kwun et al., 9 Oct 2025)
- "DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory" (Chee et al., 11 Jan 2025)