Automatic Dynamic Precision (ADP)
- Automatic Dynamic Precision (ADP) is a family of techniques that dynamically adjusts numeric precision at runtime, balancing performance, energy, and accuracy across computing systems.
- Architectural approaches like the ADiP systolic array enable adaptive bitwidth configurations, achieving up to 4× higher compute density while managing trade-offs in area and power consumption.
- Algorithmic and data-structural methods, including dynamic slicing and live bit-certification, allow precision to be tuned on-the-fly for optimized GEMM, simulations, and stochastic optimization.
Automatic Dynamic Precision (ADP) refers to a family of algorithmic, architectural, and data-structural techniques in scientific computing, numerical linear algebra, machine learning hardware, and simulation-based optimization that enable the dynamic adjustment of numeric precision at runtime. Unlike static quantization or fixed precision, ADP operates by selectively tuning arithmetic precision, bitwidth, or stochastic noise levels either at the level of hardware (parallel processors, systolic arrays), numerical algorithms (matrix decompositions, GEMM), or software primitives (floating-point representations, numpy array kernels), in response to application demands, input statistics, or real-time numerical reliability requirements. This paradigm has demonstrated significant improvements in computational efficiency, energy consumption, and accuracy maintenance across deep learning, transformer inference, scientific PDE solvers, and high-fidelity optimization workflows.
1. Architectural Approaches to ADP in Hardware Accelerators
One prominent realization of ADP is the ADiP (Adaptive Precision Systolic Array) architecture, designed for matrix multiplication in transformer workloads (Abdelmaksoud et al., 12 Oct 2025). The ADiP architecture employs a two-dimensional systolic array of adaptive-precision processing elements (PEs). Each PE consists of sixteen 2-bit multipliers (D-MULs), which can be time-multiplexed or parallelized depending on the global precision mode signal. Precision is switched via operand interleaving: for 8b8b operation, weights are loaded as 8-bit tiles; for 8b4b, each PE receives two interleaved 4-bit sub-tiles; for 8b2b, up to four 2-bit sub-tiles are exploited, driving maximal D-MUL utilization.
A global 2-bit precision control signal orchestrates bit-path demultiplexers and shared column-wise shift/accumulate units, dictating both input dispatch and psum post-processing. ADiP supports both symmetric (single-matrix) and asymmetric (multi-matrix, shared-activation) GEMM, increasing PE utilization and enabling fine-grained trade-offs between throughput and energy.
Peak throughput for a array (4096 PEs) reaches 8.192, 16.384, and 32.768 TOPS for 8b8b, 8b4b, and 8b2b precision, respectively. Design-space analysis demonstrates up to higher compute density and up to 53.6% lower workload latency, albeit with a roughly area-power overhead relative to fixed-8b systolic arrays.
2. Algorithmic ADP: Dynamic Slicing and Emulation for High-Precision GEMM
ADP is also advanced at the algorithmic level for matrix multiplication on low-precision hardware. Schwarz et al. introduce an ADP framework that leverages extended Ozaki decompositions on GPU tensor cores, enabling guaranteed FP64-accuracy GEMM via dynamic, hardware-agnostic runtime selection of “slices” (Schwarz et al., 16 Nov 2025).
The core estimator is the Exponent Span Capacity (ESC), which conservatively computes the minimal integer-slice count 0 needed to represent a 53-bit FP64 mantissa plus additional “exponent headroom” from potentially wide operand exponents: 1 with 2, and 3 the slice bitwidth (e.g., 8 for INT8).
ADP integrates this runtime estimation with an unsigned integer slicing scheme and CUDA-only kernels for the full emulation workflow—including NaN/Inf scans, ESC, and speedup heuristics—enabling fallback to native FP64 only where emulation is unsafe or slower. This design achieves up to 4 speedup on NVIDIA RTX Pro 6000 Blackwell SE, with <10% runtime overhead to accuracy safeguards, passing all standard BLAS grading tests.
3. Data-Structural ADP: Explicit Precision Tracking in Numerical Software
Netay et al. present a data-structural ADP technique in which every IEEE 754 floating-point value is extended with an “exact-bits” counter, producing a tuple (value, 5) where 6 is the number of reliable mantissa bits (Netay, 2024). All numerical operations—arithmetic, library calls, or reductions—update this counter using explicit, operation-specific recurrences reflecting alignment loss, rounding, and condition number.
For matrix multiplication, a “tropical” (max-plus) matrix product 7, with 8 and 9 the exact-bits arrays, provides an elementwise lower bound on the reliable bits in 0:
1
Refinement methods incorporating operand magnitudes further sharpen these bounds.
This “live bit-certification” enables dynamic-precision control: one can escalate datatype bitwidth, select error-compensating algorithms, or trigger warnings “when and where” precision loss becomes material, all with minimal overhead. The approach is implemented in xnumpy and integrates transparently across array and tensor-network computation.
4. Fine-Grained Arithmetic ADP: Runtime Precision Reconfiguration in Scientific Computing
Runtime reconfigurable arithmetic units for floating-point multiplication, such as the R2F2 unit, epitomize ADP at the hardware datapath level (Hao, 2024). R2F2 dynamically reallocates bitwidth between exponent and mantissa for each multiply, based on operand range and detected overflow/redundancy.
After each multiplication, if overflow/underflow is detected, exponent width is grown (by stealing bits from the mantissa); if leading exponent bits are redundant, a bit is given back to the mantissa for future multiplies. This is performed by a precision-adjust FSM intertwined with the standard datapath, and incurs negligible resource and latency cost.
Empirical results show R2F2 (at 16 bits) achieves up to 70% lower arithmetic error than standard half-precision, matches full 32-bit accuracy in PDE simulators (heat equation, shallow-water equations), and requires only a handful of retry or bit-reallocation events. This demonstrates the efficacy of operandwise dynamic-precision adaptation in critical scientific kernels.
5. ADP in Analog Computing: Programmable, Noise-Equivalent Precision via Redundancy
In analog and optical AI accelerators, ADP manifests as programmable redundant-coding—repeating dot-products in space or time and averaging to reduce effective noise variance (Garg et al., 2021). The achieved precision is mapped by equating the noise variance of the analog result to the quantization noise of a digital 2-bit uniform quantizer: 3 Here, analog precision is raised by 4 bits for 5 repeats, with corresponding energy, area, and throughput trade-offs that can be learned automatically per layer or channel (via SGD on an unconstrained loss balancing energy and accuracy).
Evaluations on CNNs and BERT show per-layer ADP enables up to 6 energy reduction (CNNs) and 7 in BERT for a 8 accuracy drop, with layerwise “noise-equivalent” bit requirements varying over a 2–39 dynamic range. A plausible implication is the substantial value of application-specific, nonuniform allocation of hardware and energy to maximize accuracy under aggressive precision scaling.
6. ADP in Stochastic Simulation-Based Optimization
In blackbox and derivative-free optimization, ADP operates by adapting the stochastic noise level (or equivalently, computation granularity/precision) used in each objective function evaluation (Alarie et al., 2019). Algorithms such as DpMads (Dynamic Precision MADS) select a “precision index” 0, mapped to standard deviation 1, and dynamically tune 2 as a function of the statistical reliability of comparisons in poll steps.
Crucially, DpMads allows 3 to rise or fall based on p-value thresholds quantifying decision confidence, keeping precision just high enough for progress while minimizing simulation effort—contrasting with the monotonic escalation in traditional Robust-MADS or MpMads. This yields orders-of-magnitude reduction in computational cost in high-noise simulations while retaining provable convergence to Clarke-stationary points.
Numerical tests show DpMads can solve norm2 and industrial asset-management problems with 4–5 fewer simulation draws than fixed-precision direct search, by focusing precision only where the optimization algorithm demands.
7. Limitations, Trade-Offs, and Prospects for ADP
ADP approaches universally trade area/power (hardware), implementation complexity (algorithms), or slight overhead (software/hardware FSMs) for dramatic efficiency and accuracy gains. Hardware-level techniques such as ADiP incur 6 area/power, while bringing up to 7 compute density, but are presently limited to discrete precision steps (e.g., 8→4→2 bits). Algorithmic ADP may overestimate required slice counts (ESC), occasionally falling back to high precision unnecessarily, though this is conservative.
Some dynamic-precision schemes are restricted by the granularity of control (layerwise/channel-wise in neural networks; per-operation in floating-point arithmetic). Integrating more sophisticated online quantization analysis, per-PE bit-serial scaling, or tight mathematical analysis of error propagation is a major direction for extending ADP in future architectures and workloads.
Across hardware acceleration, numerical linear algebra, scientific computing, analog inference, and optimization, ADP enables systems to monitor and adapt numeric precision automatically, ensuring that accuracy, performance, and resource objectives are simultaneously met (Abdelmaksoud et al., 12 Oct 2025, Schwarz et al., 16 Nov 2025, Netay, 2024, Hao, 2024, Garg et al., 2021, Alarie et al., 2019).