Approximate Multipliers (AxMs)
- Approximate multipliers are digital circuits that intentionally trade arithmetic precision for improved energy, area, and delay through controlled inexact computation.
- They employ techniques such as approximate adders, bit-plane truncation, and compressor-based reduction to achieve tailored error-energy tradeoffs.
- These designs are crucial in applications like deep neural network inference, digital signal processing, and imaging where bounded error is acceptable for efficiency gains.
Approximate multipliers (AxMs) are digital hardware circuits that intentionally trade arithmetic exactness for improved energy, area, and delay characteristics. They are a cornerstone of approximate computing, a paradigm aimed at optimizing systems for error-resilient applications by permitting controlled deviations from numerical accuracy. AxMs are widely deployed in domains such as deep neural network (DNN) inference, digital signal processing (DSP), and image/video pipelines, where application-level quality can tolerate bounded arithmetic imprecision—for significant gains in power, area, and performance.
1. Structural Principles and Design Parameters
Classical multipliers—such as array or tree-based architectures—sum all partial products using networks of full adders (FAs) and half adders (HAs) to obtain an exact product. In AxMs, inexactness is introduced through various circuit-level techniques:
- Approximate Adders: Exact FAs in the summation network are replaced by inexact variants (e.g., Approximate Mirror Adders AMA1–AMA5) that simplify logic at the expense of error on certain input combinations. These reduce area and power but induce bounded and quantifiable arithmetic error (Masadeh et al., 2019).
- Bit-Plane Truncation: The least significant k bits of the multiplier output are either calculated approximately (using Ax FAs) or hard-wired to zero, controlling the "degree" of approximation. Degrees D₁–D₄ designate 7, 8, 9, or 16 bits approximated, respectively (Masadeh et al., 2019).
- Compressor-based Reduction: In modern high-performance multipliers, multi-input compressors (e.g., 4:2, 3,3:2) can be made "approximate" by omitting or simplifying logic in low-weight positions (Jaswal et al., 31 Aug 2025, Karimi et al., 2021).
- Broken-Array and Booth Approximations: PP rows or columns are masked out, as in Broken Booth Multipliers (BBM), where the VBL (Vertical Breaking Level) specifies the number of LSB columns set to zero (Farshchi et al., 2020).
- Truncation and Compensation: Novel designs such as scaleTRIM use input truncation (e.g., using leading-one position, followed by linearization and LUT-based compensation) to provide efficient, adjustable tradeoffs between hardware cost and error (Farahmand et al., 2023).
- Sequential Approximate Multiplier: Carry chains are segmented to create an approximation knob in classical shift-and-add architectures, offering a new latency–accuracy tradeoff not seen in combinatorial schemes (Echavarria et al., 2021).
The two dominant tuning parameters are thus (i) which internal functional units (adders, compressors) are approximated and (ii) how many result bits (or PP columns) are subjected to approximation.
2. Error Metrics and Input–Error Dependency
Evaluation of AxMs is grounded in precise error metrics, typically computed over exhaustive operand sweeps for small bit-widths:
- Error Distance (ED):
- Normalized Error Distance (NED): (when )
- Mean Error Distance (MED): Arithmetic mean of ED over all input pairs
- Maximum Error Distance (MaxED): Maximal ED observed over all input pairs
- Error Rate (ER): Fraction of input pairs producing a nonzero error
- Peak Signal-to-Noise Ratio (PSNR): Especially relevant for image-like outputs; computed from output MSE (Masadeh et al., 2019, Jaswal et al., 31 Aug 2025)
Input–error dependency is a defining feature: the error magnitude is strongly correlated with the input pair location in space. Empirical heatmaps reveal that:
- Increasing the approximation degree (e.g., D₁→D₄) increases both mean and standard deviation of ED,
- Variation between FA-types (AMA1–AMA5) at a fixed degree is much smaller than the effect of degree itself,
- Correlation between degree and ED is ρ ≈ 0.90…0.98, but only ρ ≈ 0.3…0.5 for FA type (Masadeh et al., 2019).
This input dependence enables the design of guard circuits: lightweight runtime monitors (e.g., decision trees on input MSBs) can switch to a more accurate multiplier mode or re-compute exactly when the predicted error exceeds application tolerance, providing quality-of-result guarantees at minimal energy overhead.
3. Energy, Area, and Delay Trade-offs
Hardware efficiency is the principal motivation for AxM adoption. Comprehensive studies report:
| Design Degree | Area & Power Savings | PSNR (dB) | MaxED | Application Guidance |
|---|---|---|---|---|
| D₁ | 15–25% | –2 to –5 | ~500 | Modest hardware saving |
| D₂ | 25–40% | ~30 | ~1000 | Best energy–error Pareto |
| D₃ | 35–55% | ~25 | ~2000 | Suitable for DNN inference |
| D₄ | 50–70% | <15 | >16k | Excessive error except for tolerant apps |
The Pareto front is nearly linear in area/energy for small increases in approximation, but error grows nearly exponentially at higher degrees, with a "knee" between D₂ and D₃ marking the optimal operating point for most use cases (Masadeh et al., 2019).
Broken-Booth and array/truncation approaches produce similar trade-offs, with BBMs achieving up to 58% dynamic power reduction for an MSE penalty scaling exponentially with VBL. In DSP applications (e.g., 30-tap FIR), 17% power reduction at sub-0.4 dB SNR loss has been demonstrated (Farshchi et al., 2020). The systematic pruning in scaleTRIM enables similar energy Pareto improvements by combining leading-one detection, truncation, and piecewise compensation (Farahmand et al., 2023).
4. Application–Domain-Specific Deployment
AxMs are deployed broadly in:
- Deep Neural Network Inference: Large-scale DNNs can absorb uniform or near-uniform multiplier error with minimal accuracy loss, especially when batch normalization is appropriately rescaled to absorb bias (Kim et al., 2020, Pinos et al., 8 Apr 2024). For example, replacing FP32 multiplies by AxMs (mul8u_NGR) in DNN layers achieves 53.8% arithmetic energy reduction with only a 0.65 percentage point drop in CIFAR-10 accuracy (Pinos et al., 8 Apr 2024). Design choices should be informed by whether application constraints care more about maximum deviation (e.g., for safety-critical domains) or mean error (e.g., average-case DNN inference).
- Digital Signal Processing: FIR filtering, image blending, and video pipelines benefit from AxMs when SNR drops are bounded (e.g., 0.3–0.5 dB) (Farshchi et al., 2020, Masadeh et al., 2018). Pareto-optimal designs achieve up to 70% improvement in compound quality–area–power FOMs.
- Hybrid and Reconfigurable Systems: Runtime mode switching (dynamic accuracy reconfiguration) and guard classifiers are practical for scenarios with variable error tolerance (Masadeh et al., 2019). This enables aggressive energy optimization while guaranteeing error bounds for critical inputs.
For sequential multipliers with segmented carry chains, accuracy–configurable architectures offer minimal area overhead and latency reductions up to 30% relative to exact shift-and-add designs, while maintaining sufficiently low normalized error for most signal domains (Echavarria et al., 2021).
5. Automated Generation and Optimization
Modern AxM libraries and generative methodologies employ formal design space exploration to automatically produce Pareto-frontier circuits:
- Bayesian Optimization: AMG explores the design space by assigning simplification options to each HA in the PP compression array, guided by FPGA-aware cost functions such as PDAE (Power × Delay × Area × Error). Resulting designs show 29%–38% lower cost–error products than any of 1167 prior designs (Li et al., 2023).
- Evolutionary and Differentiable NAS: Tools such as ApproxDARTS and CGP-based NAS incorporate AxMs from libraries like EvoApproxLib or EvoApproxLib‐Lite during neural architecture search. These tools jointly optimize network topology and multiplier selection, yielding application-specific CNNs that trade off accuracy, network size, and multiplication power (Pinos et al., 8 Apr 2024, Pinos et al., 2021).
Parameterizable architectures, such as those using degree of approximation and component type as configuration knobs, support rapid algorithm–hardware co-optimization according to application error tolerance and hardware budgets (Masadeh et al., 2019, Farahmand et al., 2023).
6. Guidelines and Practical Recommendations
Key design recommendations arising from this corpus are:
- Select the degree of approximation (number of truncated/approximate bits) as the dominant error–energy tradeoff knob; the precise adder or compressor type is a secondary, fine-tuning option (Masadeh et al., 2019).
- For image/video or perception pipelines, maintain PSNR ≥ 25 dB for visually lossless results; choose D₂ or AMA4/AMA5 designs for best compromise (Masadeh et al., 2019).
- For DNN accelerators, focus on minimizing the mean error; in 8-bit MAC arrays, AxMs such as AMA3–D₃ yield substantial energy savings at median MED <2% of output range (Masadeh et al., 2019).
- When application constraints require strict upper bounds on output error, deploy lightweight input–error guards or classifiers to reroute at-risk input pairs to exact computation, with <5% energy overhead (Masadeh et al., 2019).
- The use of sequential AxMs based on segmented or truncated carry chains is uniquely efficient for very wide multipliers when lowest possible logic resource usage and moderate latency are priorities (Echavarria et al., 2021).
- Exhaustive input–output simulations or block-wise heatmaps are critical to characterize, verify, and bound the error patterns for given input distributions, especially where input–error correlation is strong.
7. Outlook and Future Research
The AxM landscape is advancing toward:
- Integrated Co-design: Neural architecture search and DNN compiler stacks increasingly incorporate AxM awareness, enabling end-to-end co-optimization of energy-precision tradeoffs at network, module, and circuit levels (Pinos et al., 8 Apr 2024, Pinos et al., 2021).
- Quality Adaptivity: Real-time quality monitoring and dynamic approximation adaptation mechanisms become feasible and necessary as approximate computing is pushed into safety- and latency-constrained systems (Masadeh et al., 2019).
- FPGA and ASIC Portability: Dedicated generators such as AMG bridge architectural gaps between ASIC and FPGA, optimizing AxMs for highly divergent implementation fabrics (Li et al., 2023).
- Fine-grained Error Modeling: Analytical frameworks that relate error statistics (bias and variance) of AxMs to network-level distortion and accuracy loss facilitate rapid, hardware-agnostic design-space pruning (Alahakoon et al., 6 Dec 2025).
In summary, approximate multipliers constitute an essential class of arithmetic circuits that, when judiciously parameterized and validated, deliver state-of-the-art area, energy, and delay reductions in a wide array of error-tolerant and performance-critical digital systems, with robust analytical and empirical characterization enabling precise application alignment and quality assurance.