Low Power Approximate Multiplier Architecture
- Low Power Approximate Multiplier Architecture is a specialized digital circuit that introduces controlled approximation via NAND/NOR gate designs to minimize energy consumption.
- It employs a single-error 4:2 compressor across all partial product columns, achieving a power-delay product as low as 91.20 fJ and reducing energy consumption by over 30%.
- The design has been validated in deep neural network tasks, delivering high image quality (PSNR ≈ 34.95 dB, SSIM ≈ 0.9713) and negligible accuracy loss in digit recognition.
A low power approximate multiplier architecture is a specialized digital circuit that minimizes energy consumption by judiciously introducing controlled error into arithmetic multiplication operations while optimizing hardware area, critical path delay, and switching activity. In the context of deep neural networks and other error-resilient applications, such architectures achieve a balance between computational accuracy and efficiency, making them suitable for high-throughput, low-power inference engines and signal processing systems.
1. Core Architectural Concept
Low power approximate multipliers target the dominant sources of dynamic energy consumption in digital arithmetic, namely logic switching and redundant computations in partial product accumulation. The design introduced in (Jaswal et al., 31 Aug 2025) is built around a high-accuracy 4:2 compressor structure in which the compressor itself has a single error-producing input combination—confering it an error probability of 1/256—resulting in minimal degradation of computational integrity.
Design highlights:
- Deployment of exclusively approximate compressors in all partial product columns, not just LSBs.
- Elimination of additional carry propagation chains, allowing fast PPA (partial product accumulation) and reduced capacitance.
- Crucial circuit equations:
- Carry output:
- Sum output:
- Intermediate:
By using only NAND and NOR gates, the architecture reduces gate count, propagation delay, and leakage.
2. Energy and Hardware Performance
Integration of this high-accuracy approximate compressor within an 8×8 unsigned multiplication datapath produces pronounced energy efficiency. Synthesis under UMC 90nm technology demonstrated:
- Up to 30.24% lower energy consumption than the best state-of-the-art (SOTA) reference multipliers.
- Power-delay product (PDP) as low as 91.20 fJ, notably lower than prior SOTA.
Comparative error metrics:
- Mean Relative Error Distance (MRED) of 0.109%.
- Error Rate (ER) of 6.994%. Both metrics signify that approximation is tightly bounded and error is exceptionally rare.
3. Application in Neural Network Architectures
The multiplier was integrated into convolutional layers for both image denoising (FFDNet) and handwritten digit recognition (MNIST), and compared against a suite of previous approximate and exact multiplier designs.
- In image denoising for Gaussian noise level σ = 25, PSNR approaches 34.95 dB and SSIM is 0.9713, outperforming other approximate designs.
- For handwriting recognition, classification accuracy degradation is negligible: 93.54% (Keras CNN) and 96.45% (LeNet-5), both within a high accuracy range and nearly identical to exact counterparts.
This demonstrates that, despite aggressive compressor approximation, system-level accuracy and fidelity remain almost unimpaired.
4. Comparison with Prior Approaches
Compared to alternate architectures—numerous of which restrict approximation to low-bit columns or use more error-prone compressors—the single-error compressor design achieves superior efficiency and reliability:
- Traditional approximate compressors reported error rates from 16/256 to 70/256, incurring significant error in outputs.
- The proposed compressor, with one error out of 256 combinations, reduces power consumption and area but delivers an order of magnitude improvement in error resilience.
- Tabular comparison confirms consistently lower PDP and MRED for the proposed architecture.
| Metric | Proposed (Jaswal et al., 31 Aug 2025) | Best Alt-Design 1 | Best Alt-Design 2 |
|---|---|---|---|
| Energy | Lowest (reference: 91.2 fJ) | +27.48% | +30.24% |
| MRED | 0.109% | 0.932% | 2.4% |
| PDP | 91.20 fJ | 129.4 fJ | 159.6 fJ |
5. Trade-offs and Design Implications
This architecture deliberately employs a fully approximate compression path, but approximates only a single logic combination, resulting in minimal error amplification even under worst-case partial product patterns. The selection of gate primitives (NAND/NOR) and elimination of carry chain propagation yields further reductions in switching activity and dynamic power dissipation.
Designers must consider:
- The extremely low error probability yields essentially uncompromised suitability for AI applications where low SNR loss and inference accuracy loss are critical.
- Aggressive approximation of all columns (not just LSBs) is enabled by the high-accuracy compressor, which would be infeasible with higher-error alternatives.
- System-level accuracy remains within the tolerance thresholds for both image quality metrics and classification accuracy, as empirically demonstrated.
6. Applicability and Suitability
The architecture is particularly advantageous for:
- Edge AI hardware, mobile computing, and embedded systems, where stringent area and energy constraints are paramount.
- Real-time convolutional neural networks for vision and denoising, given the superior PSNR and SSIM.
- Broader signal processing and low-power inference engines necessitating a favorable trade-off between efficiency and mathematical precision.
7. Conclusion
A low power approximate multiplier using a single-error 4:2 compressor integrated in an 8×8 parallel multiplier delivers state-of-the-art energy efficiency and area optimization, with empirically validated minimal error propagation in deep neural network tasks (Jaswal et al., 31 Aug 2025). The architectural balance it achieves—a single switching error per 256 inputs combined with NOR/NAND gate economies—demonstrates the viability of comprehensive approximation for high performance, low-power AI hardware without significant compromise in output quality or accuracy.