Papers
Topics
Authors
Recent
Search
2000 character limit reached

Integer-Based Winograd Convolution

Updated 13 February 2026
  • Integer-based Winograd convolution is a family of methods that use integer arithmetic to perform CNN convolutions, reducing multiplications and ensuring efficient inference on hardware accelerators.
  • The approach employs fixed-point, tap-wise quantization, RNS pipelines, and learned transform matrices to manage dynamic range and quantization errors.
  • These methods significantly lower arithmetic complexity, memory footprint, and energy consumption while maintaining accuracy in low-precision settings.

Integer-based Winograd convolution encompasses a family of methods for executing Winograd-accelerated convolutional neural network (CNN) operators using integer arithmetic throughout the entire pipeline. These approaches are specifically motivated by the need for efficient inference of quantized neural networks on hardware accelerators that natively operate on integer data (INT8/INT16), and by the challenge of controlling rounding and scaling errors when applying the classical Winograd transforms, which are typically defined over rational or real-valued fields. Methods in this domain enable both low-bitwidth convolution with minimal accuracy degradation and substantial reductions in arithmetic complexity, memory, and energy consumption.

1. Foundations of Winograd Convolution and Integer Arithmetic

Winograd convolution leverages the Cook–Toom polynomial interpolation algorithm to reduce the number of multiplications required for small spatial convolutions (e.g., 3×33\times3), exploiting the relation

Y=AT[(GgGT)(BTdB)]AY = A^T \left[(G g G^T) \odot (B^T d B)\right] A

where gg is the spatial-domain kernel, dd is the input tile, and \odot denotes Hadamard multiplication. GG, BTB^T, and ATA^T are transform matrices derived via Lagrange interpolation from a set of sample points. In the prevailing rational (or integer) constructions, these matrices contain fractional or large integer elements, resulting in large dynamic ranges and numerically unstable transforms as tile size grows.

Integer-based Winograd methods are designed to perform all steps—including the transform, elementwise multiplication, and inverse transform—using only integer arithmetic. This is challenging due to the need to accurately represent the transform matrices, control dynamic ranges, and avoid information loss due to quantization. Several principal approaches have been developed to address these obstacles:

Each approach provides different trade-offs between implementation complexity, precision, hardware efficiency, and convolutional layer size.

2. Integer-based Pipeline Construction and Quantization Schemes

Implementing an integer-only Winograd pipeline requires handling quantization at multiple levels:

Fixed-point Quantized Transforms

All tensors—input tiles (dd), kernel tiles (gg), transform matrices (GG, BB, AA), and the Winograd-domain tensors (UU, VV, MM, YY)—are quantized to kk-bit signed integers via per-tensor symmetric quantization: scaleX=maxX/(2k11),X^=round(X/scaleX)\text{scale}_X = \max |X| / (2^{k-1}-1), \quad \widehat X = \text{round}(X/\text{scale}_X) This process is performed for the spatial tensors as well as the Winograd transforms, ensuring that all matrix multiplications and additions occur in integer space (Fernandez-Marques et al., 2020).

Tap-wise Quantization

Dynamic ranges of Winograd-domain elements (the "taps") may vary significantly across positions. Tap-wise quantization allocates individual scale factors for each tap in both GgGTGgG^T and BTdBB^TdB, implemented as two 6×66\times6 arrays (SGS_G, SBS_B) for F4F_4. These scale factors are chosen to be powers-of-two, permitting rescaling via integer shift operations and eliminating floating-point multipliers from the main loop. Training can tune scales via straight-through log-gradient estimators, often with knowledge distillation from the full-precision model; calibration via sample maxima and nearest 2k2^k rounding is also applied (Andri et al., 2022).

Residue Number System (RNS) Pipeline

RNS–Winograd absorbs all denominators from rational-valued GG into a single scalar α\alpha and implements transformations modulo sets of small, pairwise coprime integers {mi}\{m_i\}. All linear-algebraic steps are done modulo mim_i, and the output is reconstructed using the Chinese Remainder Theorem or Mixed-Radix Conversion. This approach is exact and sidesteps scaling and rounding issues outright, at the cost of increased code and hardware complexity for residue channel management and final reconstruction (Liu et al., 2020).

Integer Filter Scaling

For approaches where post-transform weights exceed the available bit-width, a per-position scaling factor is applied to compress each Winograd-domain coefficient into the desired integer range before elementwise multiplication. These factors and their inverses are managed explicitly per tap, and the results are rescaled after accumulation (Meng et al., 2019).

3. Condition Number, Numerical Stability, and Learned Transforms

Numerical instability in Winograd transforms—primarily arising from the high condition number κ2\kappa_2 of the underlying Vandermonde matrices—has proven to be the core obstacle when scaling tile size or reducing precision. Standard integer-point sampling leads to exponential growth in κ2\kappa_2 with tile size (e.g., for F(8,3)F(8,3), κ22×105\kappa_2 \approx 2\times10^5), making INT8 and even FP16 inference unreliable for large tiles (Lohia, 20 Dec 2025).

Recent work optimizes sample points as a continuous search problem (NOVA framework), identifying well-conditioned rational configurations (e.g., {±5/6,±7/6,±3/5}\{\pm5/6,\, \pm7/6,\, \pm3/5\}) with vastly improved stability. These new transforms permit drop-in replacement in Winograd pipelines, restoring accuracy lost to instability and enabling large-tile, integer-only convolutions with theoretical arithmetic complexity savings (Lohia, 20 Dec 2025). Learning transform matrices G,B,AG,B,A during quantization-aware training—rather than fixing their structure—also allows the model to recover from otherwise catastrophic quantization errors at low precisions (Fernandez-Marques et al., 2020).

4. Hardware Implementation, Practical Integration, and Complexity

Integer-based Winograd is specifically tailored for hardware efficiency.

Integer Bit-Pipeline

A modern pipeline processes spatial-domain activations and weights as 8-bit signed integers. Post-transform tiles are bit-extended (+2+2 bits for input, +3+3 for weights), tapped, and quantized back to int8. The per-tap MAC operations are performed in int32, rescaled by power-of-two operations. The inverse output transform (often ATAA^T \ldots A sequences) consists of integer additions, subtractions, and shifts, with explicit bit-width tracking and final int8 clamping (Andri et al., 2022).

Custom Hardware Blocks

Architectures implementing integer Winograd often feature distinct units for each transform and multiplication operation (IN_XFORM for BTdBB^T d B, WT_XFORM for GgGTG g G^T, FixPipe for ATAA^T \ldots A), specialized SRAM blocks, and dataflow optimizations such as double-buffered loading and on-the-fly weight transform. Specific resource and energy costs are reported; e.g., Cube Unit at $1$ TOP/s@500MHz (1.92W with F4F_4), Winograd-specific MTE engines for input/output transformations (0.1\sim0.1–$0.3$mm2^2 each), and area/power impacts for Winograd logic overheads (e.g., +6.1%+6.1\% area, +17%+17\% power for Cube, amortized by overall cycle savings) (Andri et al., 2022).

Arithmetic Complexity and Speedup

The theoretical arithmetic reduction realized by Winograd increases with tile size: for F(2,3)F(2,3), a 2.25×2.25\times reduction (9→4 multiplies); for F(4,3)F(4,3), 4×4\times (36→9); for large tiles, up to 7.03×7.03\times (e.g., M=12,R=5M=12,R=5 filters). Effective speedup, after accounting for transform, scaling, and bitwidth management, ranges from 1.5×1.5\times (F2F_2) to 3.3×3.3\times (F4F_4, large batch/feature maps) over efficient im2col/INT8 baselines, contingent on network architecture and hardware parameters (Andri et al., 2022, Liu et al., 2020). Overheads for Winograd transforms and CRT/MRC are generally 1020%10-20\% of runtime for large tiles (Liu et al., 2020).

5. Accuracy, Error Analysis, and Robustness

Theoretical quantization error per tap is bounded by 0.5Stap0.5 \cdot S_{\text{tap}}; with tap-wise scaling, total output-tile error is 36maxtap(Stap)\sim36\cdot\max_{\text{tap}}(S_{\text{tap}}). Empirical evaluations show:

  • Top-1 accuracy on ImageNet (ResNet-34, FP32 baseline 72.6%):
    • F4F_4 with shared quant, int8/10: 69.1%69.1\% (3.5%-3.5\%)
    • F4F_4 tap-wise (float scales): 72.0%72.0\% (0.6%-0.6\%)
    • F4F_4 tap-wise, power-of-2, KD, int8/10: 72.3%72.3\% (0.3%-0.3\%)
  • CIFAR-10/ResNet-20: <0.1%<0.1\% drop vs. baseline for tap-wise F4F_4 with int8/9 (Andri et al., 2022).
  • RNS-Winograd: no measurable accuracy loss on VGG-16 (8-bit, F(14,3)F(14,3) tile) (Liu et al., 2020).

Learning integer-friendly transforms further closes the accuracy gap for large tiles at INT8, frequently achieving full recovery of FP32 accuracy (e.g., INT8 Winograd-aware F4 (learned) ResNet-18: 93.54%93.54\% CIFAR-10, vs. 93.16%93.16\% im2row) (Fernandez-Marques et al., 2020). Classical integer-point Winograd with large tiles can collapse in low-precision (e.g., F(6,3), FP16, VGG-16 drops to 4.7%4.7\% accuracy), but conditioning-optimized (NOVA) transforms restore accuracy to 75.3%75.3\% (Lohia, 20 Dec 2025).

6. Comparative Methods and Limitations

Classical Integerization and Filter Scaling

Standard methods—such as scaling GG to be integer-valued, with per-position scaling factors and compressed integer storage—yield substantial bit-width savings (e.g., 30.77%30.77\% reduction, 13913\to9 bits for F(2×2,3×3)F(2\times2,3\times3)) at negligible accuracy cost (<0.3%<0.3\% on ResNet-50/InceptionV3) (Meng et al., 2019).

RNS-based Approach

RNS-Winograd provides exact, overflow-robust integer arithmetic by representing all computations modulo small mim_i. This enables numerically stable, large-tile (10×1010\times10 and up) transforms with up to 7.03×7.03\times reduction for 5×55\times5 kernels, but increases code complexity and demands additional hardware or software support for multi-channel residue management and output reconstruction (Liu et al., 2020).

Relaxed/Learned Transform Matrices

Making G,B,AG,B,A trainable (as opposed to fixed Cook–Toom structures) dramatically improves robustness at very low precisions and for large tiles, especially when directly including the integer-quantized pipeline in every forward pass of training (Fernandez-Marques et al., 2020, Lohia, 20 Dec 2025).

Limitations and Trade-offs

Integer-based Winograd approaches must balance transform overhead, per-tap scale management, hardware resource constraints, and the risk of overflow in accumulations. The efficacy of the arithmetic complexity reduction is constrained by practical considerations such as the throughput of integer MAC units, the transform-to-MAC ratio, and per-layer variations. For RNS, the net gain may evaporate for small tiles or when too many residue channels are required.

7. Impact, Benchmarks, and Future Directions

The deployment of integer-based Winograd techniques has enabled:

  • End-to-end energy efficiency gains up to 1.85×1.85\times, speed-ups up to 3.3×3.3\times for large networks and F4F_4 tiles, and robust operation in commercial and experimental low-precision deep learning accelerators (Andri et al., 2022).
  • Compatibility and superiority over industrial baselines (e.g., F4 accelerator $1.5$–3.3×3.3\times faster than 8×8\times NVDLA F2 at $8$ TOP/s, $41$ GB/s BW) (Andri et al., 2022).
  • Recovery of theoretical speedup numbers (e.g., 2.25×2.25\times for F2F_2, 4×4\times for F4F_4, 7.03×7.03\times for F(12,5)F(12,5)) in quantized pipelines (Lohia, 20 Dec 2025, Liu et al., 2020).
  • Elimination of the classical instability barrier via numerical search and learning of transform points (Lohia, 20 Dec 2025).

Research continues toward further boosting tile size, reducing bit-width, and integrating Winograd-aware design into neural architecture search and quantization-aware training frameworks. The recurring theme is that careful calibration and learning at the transform and quantization level are critical to preserving network accuracy and robustness as hardware trends to ever lower precision.


Key References

  • "Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile" (Andri et al., 2022)
  • "Efficient Residue Number System Based Winograd Convolution" (Liu et al., 2020)
  • "Searching for Winograd-aware Quantized Networks" (Fernandez-Marques et al., 2020)
  • "NOVA: Discovering Well-Conditioned Winograd Transforms through Numerical Optimization of Vandermonde Arithmetic" (Lohia, 20 Dec 2025)
  • "Efficient Winograd Convolution via Integer Arithmetic" (Meng et al., 2019)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Integer-based Winograd.