Mitch System: Approximate Log-Domain Multipliers
- Mitch System is a log-domain multiplier design that approximates multiplications via truncated iteration (Mitch-w6) while keeping additions exact.
- It enhances hardware efficiency by reducing circuit complexity and energy consumption, achieving up to 80% energy savings in CNN inference.
- Empirical results show a slight average bias (~-5.9%) and bounded error (<12%), with minimal impact on overall CNN accuracy through layer-wise scaling adjustments.
The Mitch system refers to a class of approximate log-domain multipliers designed to accelerate deep convolutional neural network (CNN) inference by reducing arithmetic and circuit complexity, most notably embodied in the Mitch-6 architecture. By intentionally approximating multiplication operations—while preserving exact addition—the Mitch system achieves substantial hardware area and energy savings with minimal loss in model prediction accuracy, provided specific analytic and architectural considerations are met (Kim et al., 2020).
1. Mitch-6: Approximate Log-Domain Multiplier Design
The Mitch-6 multiplier is an "iterative truncated" adaptation of Mitchell's 1962 log-domain multiplication scheme. For two signed inputs and , the algorithm proceeds as follows:
- Each input is decomposed into a sign bit and log-domain absolute value: and ; similarly for , , and .
- The exact product is given by .
- Mitch-6 substitutes the exact mantissa product with a small linear table lookup and a single correction iteration, explicitly discarding the lower bits of the mantissa sum.
- Operationally, the core multiply step is approximated as:
This replaces multipliers with compact shifter-adder blocks and minimal lookup tables, removing the need for a full carry-propagate multiplier tree.
2. Error Model and Statistical Bounds
Let represent the exact product, and the approximate result. Defining the relative error as , empirical evaluation on random 32-bit input pairs yields:
- Mean (slightly conservative bias)
- Maximum – Sign handling is performed via an independent one's-complement ("C1") block, ensuring error symmetry in sign but with fixed magnitude bias. The error distribution shows low variance and is largely insensitive to operand magnitude range after bit truncation.
3. Analytical Rationale: Multiplications Can Be Approximate, Additions Must Be Exact
CNN convolution and fully connected (FC) computations take the canonical form . If each product is subjected to independent, bounded error with expected value , by the law of large numbers, the total error in converges to a uniform global scaling: . Due to the nature of ReLU and softmax activations, this spatially uniform scaling does not affect feature ordering, and hence model prediction is preserved. In contrast, approximate addition would inject non-uniform, uncorrelated noise into every accumulation step, breaking this invariance and causing irrecoverable output distortion. Therefore, the Mitch system rigorously maintains exact adders while permitting approximated multiplications.
4. Preservation of Accuracy Across Network Layers
(a) Convolution Layers
With high intra-kernel multiply accumulation counts (thousands per output pixel, summing across channels), aggregate multiplication error converges strongly to its mean, resulting in a consistent, layer-wide scaling factor.
(b) Fully-Connected Layers
The large fan-in per neuron in FC layers yields similar error convergence properties. Excluding approximation for only the final FC layer alters Top-5 accuracy by less than .
(c) Batch Normalization
As the approximate multiplier introduces a fixed negative bias, without correction, the outcome would be exponential attenuation of activations in deep networks. The analytic mean scaling requires compensating the batch-normalization running mean and variance during inference as:
These per-layer scalar updates restore the original zero-mean/unit-variance normalization attained during training.
(d) Limitations
Grouped or depthwise convolutions (smaller accumulation per output) exhibit diminished error averaging and larger accuracy loss, unless a higher-precision approximate multiplier (e.g., ) is substituted.
5. Empirical Results on CNN Accuracy
The following table summarizes Top-1 and Top-5 validation error rates on single-crop ImageNet for several popular architectures, comparing FP32 to Mitch-6 inference (all models quantized to 32-bit fixed-point; ReLU and batch-norm updated as above; no retraining):
| Network | Top-1 (pp) | Top-5 (pp) |
|---|---|---|
| ResNet-50 | +0.3 | +0.2 |
| ResNet-101 | +0.1 | +0.0 |
| ResNet-152 | +0.1 | +0.1 |
| Inception-v4 | +0.2 | +0.2 |
| Inception-ResNet-v2 | +0.5 | +0.3 |
| ResNeXt-50 (grouped) | +1.3 | +1.2 |
| Xception (depthwise) | +1.7 | +1.6 |
| MobileNetV2 | +1.1 | +1.0 |
For comparison, bfloat16 multipliers with FP32 accumulators incur similar 0.2pp Top-5 error penalties on conventional architectures, and 1pp on grouped/depthwise variants. Increasing Mitch precision to reduces grouped/depthwise accuracy losses to under $0.5$pp.
6. Hardware and Energy Efficiency
32nm CMOS synthesis results for single multiply-accumulate (MAC) operations:
| Multiplier type | Energy per MAC (pJ) | Relative to bfloat16 |
|---|---|---|
| bfloat16 MAC | 7.0 | 100% |
| Mitch-w6 (16-bit equiv.) | 1.4 | 20% |
| Mitch-w6 (32-bit full) | 4.3 | 61% |
Mitch-6 thus achieves up to 80% energy savings per MAC compared to bfloat16, along with lower circuit area and shorter critical path.
7. Significance and Scope
Mitch-6 embodies a "sweet spot" for wide/deep CNN inference, striking a balance between accuracy and substantial hardware efficiency. The system's architectural principles can be summarized:
- Per-product accuracy is reduced by only a few percent.
- Batch-norm scaling corrects the global bias, containing error growth.
- High-accumulation MAC patterns enforce uniform scaling at each layer.
- The decision logic of modern CNNs depends fundamentally on feature ordering, which is preserved under uniform scaling.
- Substantial savings—$40$%–$80$% in energy—are realized relative to traditional floating-point designs.
The analytic justification for exact addition and approximate multiplication guides hardware design for future DNN accelerators, especially where constraints on power and area dominate (Kim et al., 2020).