Conjecture on combining quantization and sparsity into unified data representations with hardware support
Develop unified data representation schemes that jointly integrate quantization and sparsity, together with corresponding hardware support, to more aggressively reduce the average number of bits required to store and manipulate neural-network weights and activations, while maintaining efficiency and model accuracy across end-to-end ML accelerator execution.
References
It is also important to note that quantization and sparsity may be seen as two facets of the same data representation problem: we conjecture that in the future they will be combined in advanced data representation schemes with hardware support aiming at more aggressively reducing the number of bits needed, on average, to store and manipulate weights and activation.
— How to keep pushing ML accelerator performance? Know your rooflines!
(2505.16346 - Verhelst et al., 22 May 2025) in Section 3.4 (Exploiting Sparsity)