LNS Multiply-Accumulate Units

Updated 27 October 2025

LNS Multiply-Accumulate Units are specialized circuits that convert multiplication into addition by operating in the logarithmic domain, reducing hardware complexity.
They employ approximations, lookup tables, and innovative techniques like dual-base decomposition to efficiently manage non-linear operations such as addition and subtraction.
Advanced designs optimize error control and quantization, achieving significant improvements in area, energy savings, and throughput for applications in machine learning and scientific computing.

A Logarithmic Number System (LNS) Multiply-Accumulate (MAC) Unit is a specialized arithmetic circuit that performs dot-product or sum-product operations in the logarithmic domain, aiming to reduce multiplication complexity in hardware and to efficiently support computation over large dynamic ranges. In an LNS representation, real numbers are encoded as sign-magnitude pairs and a fixed-point logarithm, so that multiplication and division can be performed as simple addition and subtraction of exponents. This replacement of multiplication by addition is central to the LNS MAC’s purpose. However, LNS is not closed under addition or subtraction, making these operations non-trivial and requiring approximations or look-up tables. Modern LNS MAC research targets significant reductions in area and energy consumption by optimizing both arithmetic units and error control circuits, especially for machine learning, signal processing, and scientific computing applications.

1. Principles and Representations

LNS MAC units exploit the property that multiplication in the log domain becomes addition: If $x = s_x b^{\ell_x}$ and $y = s_y b^{\ell_y}$ , then $x \times y = (s_x \oplus s_y) b^{\ell_x + \ell_y}$ where $s_x, s_y$ are sign bits, $b$ is the logarithmic base, and $\ell_x, \ell_y$ are the fixed-point exponents.

Multiplication, division, and square root are hardware-efficient: each becomes addition, subtraction, and halving of exponents, respectively. LNS MAC hardware is commonly built around an adder (for exponents), sign-processing logic, and exponent normalization units. The complexity in LNS MAC arises in addition/subtraction in the log domain because $\ell_z = \log_b(|x + y|) = \max(\ell_x, \ell_y) + \Phi(\ell_x - \ell_y)$ , where $\Phi$ is a non-linear correction function, typically approximated.

Recent designs move beyond base-2 systems (Alam et al., 2021) by choosing LNS bases and scaling factors that better fit quantized data distributions, providing lower average error rates and facilitating logic-based (rather than ROM-based) implementations of the correction table. Hybrid representations, such as dual-base decompositions ( $x = \pm 2^a \cdot e^b$ ), offer hardware pipelines with $O(n^2)$ scaling (Johnson, 2020).

2. Addition, Subtraction, and Error Control

Addition and subtraction in LNS are not closed and require approximations of the correction functions $\Phi^+(x) = \log_b(1 + b^{-x})$ (addition) $\Phi^-(x) = \log_b(1 - b^{-x})$ (subtraction)

These are typically implemented via table lookup, piecewise linear segments, or Taylor interpolations. Rigorous analyses have established tight error bounds for first-order Taylor, error-corrected, and co-transformed approximations, accounting for table interpolation, rounding, and fixed-point multiplication errors (Nguyen et al., 30 Jan 2024). For subtraction, co-transformation techniques decompose the problem into manageable intervals by chaining table lookups and interpolations over critical ranges where derivatives diverge, maintaining bounded error.

For hardware efficiency and robust numerical behavior in training, piecewise linear approximations can be quantization-aware, with slopes as powers of two (“bit-shiftable”) and offsets tailored to specific precision levels; simulated annealing optimizes bin boundaries to minimize quantization-induced error (Hamad et al., 20 Oct 2025).

3. Hardware Design Efficiency

LNS MAC units eliminate multipliers, instead relying on adders and, for corrections, combinational logic and lookup tables.

Base selection is critical for low-error performance; non-base-2 choices can offer lower conversion and arithmetic errors for short word lengths, facilitating efficient logic-based realizations that save up to 90% transistor area compared to ROM implementations (Alam et al., 2021).
Dual-base architectures (Johnson, 2020) partition numbers as $x = \pm 2^a e^b$ , enabling efficient pipelined computation of exponentials and logarithms using shift-and-add and Euler integration, achieving $\leq$ 1 ulp error with chip area down to 0.23–0.37× that of comparable CORDIC designs.
Quantization-aware approximations (Hamad et al., 20 Oct 2025) achieve up to 32.5% area and 53.5% energy reductions over linear fixed-point MACs, by encoding addition function segments as shifts plus offset.
Accumulator width optimization (Sakr et al., 2019): Analytical models relate accumulation length and mantissa size to variance retention ratio (VRR), enabling tailored accumulator sizing that can reduce bit-width by 1.5–2.2× without accuracy loss.

Energy-per-MAC is often substantially lower than in floating-point or fixed-point systems, and pipelined architectures permit high throughput at a given clock rate.

4. Training and Inference in Deep Networks

LNS MACs have been deployed for both training and inference:

LNS-Madam optimizer (Zhao et al., 2021): Co-design of an LNS representation with multiplicative weight update avoids loss of gradient information due to quantization gaps, permitting low-precision training with 8 bits to match full-precision accuracy and yield >90% energy reduction against FP32. Multiplicative updates (adding in the log domain) maintain quantization error independently of weight magnitude, ensuring stable training even at low precision; custom hardware processing elements use fast exponent addition, efficient LUT power-of-two shifting, and optimized conversion circuits.
Universal function approximation: LNS-based neural networks retain universality, whereas morphological (max-sum, signed max-sum, max*-sum) network variants do not. The LNS MAC, coupled with appropriate activation, preserves the ability to approximate arbitrary continuous functions (Chang et al., 2022).

5. Alternative and Hybrid Multiply-Accumulate Schemes

Several designs extend the LNS MAC concept for application-specific requirements:

Weight-Sharing PASM (Garland et al., 2016, Garland et al., 2018): In CNN accelerators employing weight-sharing (image/activation data is accumulated to bins indexed by shared weights), hardware multipliers are replaced with accumulators and selection logic. This design, while not a pure LNS implementation, achieves similar multiplier elimination in a fixed-point domain, with observed reductions of up to 70% in power and logic gate count.
Analog delay-based MACs (Shukla, 2020): Mixed-signal designs for DNN inference use linearly tuneable delay cells cascaded for bit-weighted accumulation, bypassing digital multiplication entirely; energy-per-MAC is ~23 fJ/bit, with up to 5-bit precision support under tight jitter constraints.
FPGA memory-centric MAC architectures (Chen et al., 2023): Compute-in-BRAM schemes use small dummy arrays with hybrid bit-serial/bit-parallel dataflow to perform in-situ MAC computation and achieve 1.7–2.6× throughput at a modest area increase (3.4–6.8%).

6. Scalability and Practical Deployment

LNS MAC units are deployed in:

High-throughput linear algebra and machine learning accelerators, where multiply/add ratios approach 1:1 (Johnson, 2020).
FPGA logic and in-memory computing: Table-Lookup MAC (TLMAC) frameworks (Gerlinghoff et al., 18 Mar 2024) compile quantised networks to reusable LUT arrays, clustering similar weights for logic and routing efficiency, scaling to entire ImageNet models. While not strictly LNS, TLMAC’s core principle—precompiling multiplication responses and maximizing reuse—suggests possible adaptations for LNS-based systems.
AI accelerators supporting multi-format computation: “Jack units” (Noh et al., 7 Jul 2025) achieve area/power scalability across INT, FP, MX formats via precision-scalable carry-save multipliers and 2D sub-word parallelism. Methodologies from Jack units—early shifting, flexible bit-width operations—are applicable to LNS MAC design for area and power efficiency.

7. Future Directions and Ramifications

Error propagation and approximation quality remain central issues for LNS MAC research:

Rigorous error analysis of correction approximations enables precise resource budgeting and prevents over-provisioning (Nguyen et al., 30 Jan 2024).
Adaptation to diverse bases and word lengths further tunes error properties.
Approximation-aware optimization (simulated annealing, quantization-aware fitting) is essential to avoid error accumulation in training and deployment (Hamad et al., 20 Oct 2025).
Emerging research explores hybrid architectures combining lookup-based multipliers, piecewise linear addition tables, and highly pipelined LNS arithmetic units. These advances are relevant for scientific computing, signal processing, edge AI, and other domains demanding low energy, reduced area, and reliable high-dynamic-range computing.

A plausible implication is that with continued refinement of approximation methods and integration of hardware-efficient designs, LNS MAC units will become increasingly competitive as mainstream primitives for next-generation accelerators and energy-constrained AI systems.