Integer-Based Winograd Convolution

Updated 13 February 2026

Integer-based Winograd convolution is a family of methods that use integer arithmetic to perform CNN convolutions, reducing multiplications and ensuring efficient inference on hardware accelerators.
The approach employs fixed-point, tap-wise quantization, RNS pipelines, and learned transform matrices to manage dynamic range and quantization errors.
These methods significantly lower arithmetic complexity, memory footprint, and energy consumption while maintaining accuracy in low-precision settings.

Integer-based Winograd convolution encompasses a family of methods for executing Winograd-accelerated convolutional neural network (CNN) operators using integer arithmetic throughout the entire pipeline. These approaches are specifically motivated by the need for efficient inference of quantized neural networks on hardware accelerators that natively operate on integer data (INT8/INT16), and by the challenge of controlling rounding and scaling errors when applying the classical Winograd transforms, which are typically defined over rational or real-valued fields. Methods in this domain enable both low-bitwidth convolution with minimal accuracy degradation and substantial reductions in arithmetic complexity, memory, and energy consumption.

1. Foundations of Winograd Convolution and Integer Arithmetic

Winograd convolution leverages the Cook–Toom polynomial interpolation algorithm to reduce the number of multiplications required for small spatial convolutions (e.g., $3\times3$ ), exploiting the relation

$Y = A^T \left[(G g G^T) \odot (B^T d B)\right] A$

where $g$ is the spatial-domain kernel, $d$ is the input tile, and $\odot$ denotes Hadamard multiplication. $G$ , $B^T$ , and $A^T$ are transform matrices derived via Lagrange interpolation from a set of sample points. In the prevailing rational (or integer) constructions, these matrices contain fractional or large integer elements, resulting in large dynamic ranges and numerically unstable transforms as tile size grows.

Integer-based Winograd methods are designed to perform all steps—including the transform, elementwise multiplication, and inverse transform—using only integer arithmetic. This is challenging due to the need to accurately represent the transform matrices, control dynamic ranges, and avoid information loss due to quantization. Several principal approaches have been developed to address these obstacles:

Fixed-point quantization of Winograd transforms and intermediate tensors (Fernandez-Marques et al., 2020),
Tap-wise (per-“tap” or per-position) quantization, often with power-of-two scaling (Andri et al., 2022),
Residue Number System (RNS)-based transform to perform all operations modulo small coprime integers (Liu et al., 2020),
Learned or numerically optimized transform matrices to improve conditioning and minimize quantization artifacts (Lohia, 20 Dec 2025, Fernandez-Marques et al., 2020, Meng et al., 2019).

Each approach provides different trade-offs between implementation complexity, precision, hardware efficiency, and convolutional layer size.

2. Integer-based Pipeline Construction and Quantization Schemes

Implementing an integer-only Winograd pipeline requires handling quantization at multiple levels:

Fixed-point Quantized Transforms

All tensors—input tiles ( $d$ ), kernel tiles ( $g$ ), transform matrices ( $G$ , $B$ , $A$ ), and the Winograd-domain tensors ( $U$ , $V$ , $M$ , $Y$ )—are quantized to $k$ -bit signed integers via per-tensor symmetric quantization: $\text{scale}_X = \max |X| / (2^{k-1}-1), \quad \widehat X = \text{round}(X/\text{scale}_X)$ This process is performed for the spatial tensors as well as the Winograd transforms, ensuring that all matrix multiplications and additions occur in integer space (Fernandez-Marques et al., 2020).

Tap-wise Quantization

Dynamic ranges of Winograd-domain elements (the "taps") may vary significantly across positions. Tap-wise quantization allocates individual scale factors for each tap in both $GgG^T$ and $B^TdB$ , implemented as two $6\times6$ arrays ( $S_G$ , $S_B$ ) for $F_4$ . These scale factors are chosen to be powers-of-two, permitting rescaling via integer shift operations and eliminating floating-point multipliers from the main loop. Training can tune scales via straight-through log-gradient estimators, often with knowledge distillation from the full-precision model; calibration via sample maxima and nearest $2^k$ rounding is also applied (Andri et al., 2022).

Residue Number System (RNS) Pipeline

RNS–Winograd absorbs all denominators from rational-valued $G$ into a single scalar $\alpha$ and implements transformations modulo sets of small, pairwise coprime integers $\{m_i\}$ . All linear-algebraic steps are done modulo $m_i$ , and the output is reconstructed using the Chinese Remainder Theorem or Mixed-Radix Conversion. This approach is exact and sidesteps scaling and rounding issues outright, at the cost of increased code and hardware complexity for residue channel management and final reconstruction (Liu et al., 2020).

Integer Filter Scaling

For approaches where post-transform weights exceed the available bit-width, a per-position scaling factor is applied to compress each Winograd-domain coefficient into the desired integer range before elementwise multiplication. These factors and their inverses are managed explicitly per tap, and the results are rescaled after accumulation (Meng et al., 2019).

3. Condition Number, Numerical Stability, and Learned Transforms

Numerical instability in Winograd transforms—primarily arising from the high condition number $\kappa_2$ of the underlying Vandermonde matrices—has proven to be the core obstacle when scaling tile size or reducing precision. Standard integer-point sampling leads to exponential growth in $\kappa_2$ with tile size (e.g., for $F(8,3)$ , $\kappa_2 \approx 2\times10^5$ ), making INT8 and even FP16 inference unreliable for large tiles (Lohia, 20 Dec 2025).

Recent work optimizes sample points as a continuous search problem (NOVA framework), identifying well-conditioned rational configurations (e.g., $\{\pm5/6,\, \pm7/6,\, \pm3/5\}$ ) with vastly improved stability. These new transforms permit drop-in replacement in Winograd pipelines, restoring accuracy lost to instability and enabling large-tile, integer-only convolutions with theoretical arithmetic complexity savings (Lohia, 20 Dec 2025). Learning transform matrices $G,B,A$ during quantization-aware training—rather than fixing their structure—also allows the model to recover from otherwise catastrophic quantization errors at low precisions (Fernandez-Marques et al., 2020).

4. Hardware Implementation, Practical Integration, and Complexity

Integer-based Winograd is specifically tailored for hardware efficiency.

Integer Bit-Pipeline

A modern pipeline processes spatial-domain activations and weights as 8-bit signed integers. Post-transform tiles are bit-extended ( $+2$ bits for input, $+3$ for weights), tapped, and quantized back to int8. The per-tap MAC operations are performed in int32, rescaled by power-of-two operations. The inverse output transform (often $A^T \ldots A$ sequences) consists of integer additions, subtractions, and shifts, with explicit bit-width tracking and final int8 clamping (Andri et al., 2022).

Custom Hardware Blocks

Architectures implementing integer Winograd often feature distinct units for each transform and multiplication operation (IN_XFORM for $B^T d B$ , WT_XFORM for $G g G^T$ , FixPipe for $A^T \ldots A$ ), specialized SRAM blocks, and dataflow optimizations such as double-buffered loading and on-the-fly weight transform. Specific resource and energy costs are reported; e.g., Cube Unit at $1$ TOP/s@500MHz (1.92W with $F_4$ ), Winograd-specific MTE engines for input/output transformations ( $\sim0.1$ –$0.3$mm $^2$ each), and area/power impacts for Winograd logic overheads (e.g., $+6.1\%$ area, $+17\%$ power for Cube, amortized by overall cycle savings) (Andri et al., 2022).

Arithmetic Complexity and Speedup

The theoretical arithmetic reduction realized by Winograd increases with tile size: for $F(2,3)$ , a $2.25\times$ reduction (9→4 multiplies); for $F(4,3)$ , $4\times$ (36→9); for large tiles, up to $7.03\times$ (e.g., $M=12,R=5$ filters). Effective speedup, after accounting for transform, scaling, and bitwidth management, ranges from $1.5\times$ ( $F_2$ ) to $3.3\times$ ( $F_4$ , large batch/feature maps) over efficient im2col/INT8 baselines, contingent on network architecture and hardware parameters (Andri et al., 2022, Liu et al., 2020). Overheads for Winograd transforms and CRT/MRC are generally $10-20\%$ of runtime for large tiles (Liu et al., 2020).

5. Accuracy, Error Analysis, and Robustness

Theoretical quantization error per tap is bounded by $0.5 \cdot S_{\text{tap}}$ ; with tap-wise scaling, total output-tile error is $\sim36\cdot\max_{\text{tap}}(S_{\text{tap}})$ . Empirical evaluations show:

Top-1 accuracy on ImageNet (ResNet-34, FP32 baseline 72.6%):
- $F_4$ with shared quant, int8/10: $69.1\%$ ( $-3.5\%$ )
- $F_4$ tap-wise (float scales): $72.0\%$ ( $-0.6\%$ )
- $F_4$ tap-wise, power-of-2, KD, int8/10: $72.3\%$ ( $-0.3\%$ )
CIFAR-10/ResNet-20: $<0.1\%$ drop vs. baseline for tap-wise $F_4$ with int8/9 (Andri et al., 2022).
RNS-Winograd: no measurable accuracy loss on VGG-16 (8-bit, $F(14,3)$ tile) (Liu et al., 2020).

Learning integer-friendly transforms further closes the accuracy gap for large tiles at INT8, frequently achieving full recovery of FP32 accuracy (e.g., INT8 Winograd-aware F4 (learned) ResNet-18: $93.54\%$ CIFAR-10, vs. $93.16\%$ im2row) (Fernandez-Marques et al., 2020). Classical integer-point Winograd with large tiles can collapse in low-precision (e.g., F(6,3), FP16, VGG-16 drops to $4.7\%$ accuracy), but conditioning-optimized (NOVA) transforms restore accuracy to $75.3\%$ (Lohia, 20 Dec 2025).

6. Comparative Methods and Limitations

Classical Integerization and Filter Scaling

Standard methods—such as scaling $G$ to be integer-valued, with per-position scaling factors and compressed integer storage—yield substantial bit-width savings (e.g., $30.77\%$ reduction, $13\to9$ bits for $F(2\times2,3\times3)$ ) at negligible accuracy cost ( $<0.3\%$ on ResNet-50/InceptionV3) (Meng et al., 2019).

RNS-based Approach

RNS-Winograd provides exact, overflow-robust integer arithmetic by representing all computations modulo small $m_i$ . This enables numerically stable, large-tile ( $10\times10$ and up) transforms with up to $7.03\times$ reduction for $5\times5$ kernels, but increases code complexity and demands additional hardware or software support for multi-channel residue management and output reconstruction (Liu et al., 2020).

Relaxed/Learned Transform Matrices

Making $G,B,A$ trainable (as opposed to fixed Cook–Toom structures) dramatically improves robustness at very low precisions and for large tiles, especially when directly including the integer-quantized pipeline in every forward pass of training (Fernandez-Marques et al., 2020, Lohia, 20 Dec 2025).

Limitations and Trade-offs

Integer-based Winograd approaches must balance transform overhead, per-tap scale management, hardware resource constraints, and the risk of overflow in accumulations. The efficacy of the arithmetic complexity reduction is constrained by practical considerations such as the throughput of integer MAC units, the transform-to-MAC ratio, and per-layer variations. For RNS, the net gain may evaporate for small tiles or when too many residue channels are required.

7. Impact, Benchmarks, and Future Directions

The deployment of integer-based Winograd techniques has enabled:

End-to-end energy efficiency gains up to $1.85\times$ , speed-ups up to $3.3\times$ for large networks and $F_4$ tiles, and robust operation in commercial and experimental low-precision deep learning accelerators (Andri et al., 2022).
Compatibility and superiority over industrial baselines (e.g., F4 accelerator $1.5$– $3.3\times$ faster than $8\times$ NVDLA F2 at $8$ TOP/s, $41$ GB/s BW) (Andri et al., 2022).
Recovery of theoretical speedup numbers (e.g., $2.25\times$ for $F_2$ , $4\times$ for $F_4$ , $7.03\times$ for $F(12,5)$ ) in quantized pipelines (Lohia, 20 Dec 2025, Liu et al., 2020).
Elimination of the classical instability barrier via numerical search and learning of transform points (Lohia, 20 Dec 2025).

Research continues toward further boosting tile size, reducing bit-width, and integrating Winograd-aware design into neural architecture search and quantization-aware training frameworks. The recurring theme is that careful calibration and learning at the transform and quantization level are critical to preserving network accuracy and robustness as hardware trends to ever lower precision.

Key References

"Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile" (Andri et al., 2022)
"Efficient Residue Number System Based Winograd Convolution" (Liu et al., 2020)
"Searching for Winograd-aware Quantized Networks" (Fernandez-Marques et al., 2020)
"NOVA: Discovering Well-Conditioned Winograd Transforms through Numerical Optimization of Vandermonde Arithmetic" (Lohia, 20 Dec 2025)
"Efficient Winograd Convolution via Integer Arithmetic" (Meng et al., 2019)

Markdown Upgrade to Chat

References (5)

Searching for Winograd-aware Quantized Networks (2020)

Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile (2022)

Efficient Residue Number System Based Winograd Convolution (2020)

NOVA: Discovering Well-Conditioned Winograd Transforms through Numerical Optimization of Vandermonde Arithmetic (2025)

Efficient Winograd Convolution via Integer Arithmetic (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Integer-based Winograd.

Integer-Based Winograd Convolution

1. Foundations of Winograd Convolution and Integer Arithmetic

2. Integer-based Pipeline Construction and Quantization Schemes

Fixed-point Quantized Transforms

Tap-wise Quantization

Residue Number System (RNS) Pipeline

Integer Filter Scaling

3. Condition Number, Numerical Stability, and Learned Transforms

4. Hardware Implementation, Practical Integration, and Complexity

Integer Bit-Pipeline

Custom Hardware Blocks

Arithmetic Complexity and Speedup

5. Accuracy, Error Analysis, and Robustness

6. Comparative Methods and Limitations

Classical Integerization and Filter Scaling

RNS-based Approach

Relaxed/Learned Transform Matrices

Limitations and Trade-offs

7. Impact, Benchmarks, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Integer-Based Winograd Convolution

1. Foundations of Winograd Convolution and Integer Arithmetic

2. Integer-based Pipeline Construction and Quantization Schemes

Fixed-point Quantized Transforms

Tap-wise Quantization

Residue Number System (RNS) Pipeline

Integer Filter Scaling

3. Condition Number, Numerical Stability, and Learned Transforms

4. Hardware Implementation, Practical Integration, and Complexity

Integer Bit-Pipeline

Custom Hardware Blocks

Arithmetic Complexity and Speedup

5. Accuracy, Error Analysis, and Robustness

6. Comparative Methods and Limitations

Classical Integerization and Filter Scaling

RNS-based Approach

Relaxed/Learned Transform Matrices

Limitations and Trade-offs

7. Impact, Benchmarks, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research