TabConv: Low-Computation CNN Inference via Table Lookups (2404.05872v1)
Abstract: Convolutional Neural Networks (CNNs) have demonstrated remarkable ability throughout the field of computer vision. However, CNN inference requires a large number of arithmetic operations, making them expensive to deploy in hardware. Current approaches alleviate this issue by developing hardware-supported, algorithmic processes to simplify spatial convolution functions. However, these methods still heavily rely on matrix multiplication, leading to significant computational overhead. To bridge the gap between hardware, algorithmic acceleration, and approximate matrix multiplication, we propose TabConv, a novel, table-based approximation for convolution to significantly reduce arithmetic operations during inference. Additionally, we introduce a priority masking technique based on cosine similarity to select layers for table-based approximation, thereby maintaining the model performance. We evaluate our approach on popular CNNs: ResNet-18, ResNet-34, and NetworkInNetwork (NIN). TabConv preserves over 93% of the original model's performance while reducing arithmetic operations by 36.5%, 25.8%, and 99.4% for ResNet-18 on CIFAR-10, CIFAR-100, and MNIST, respectively, 35.6% and 99.3% for ResNet-34 on CIFAR-10 and MNIST, and 98.9% for NIN on MNIST, achieving low-computation inference.
- Accelerating CNN inference on FPGAs: A Survey. arXiv:1806.01683 [cs.DC]
- Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018).
- Davis Blalock and John Guttag. 2021. Multiplying matrices without multiplying. In International Conference on Machine Learning. PMLR, 992–1004.
- High performance convolutional neural networks for document processing. In Tenth international workshop on frontiers in handwriting recognition. Suvisoft.
- A survey of accelerator architectures for deep neural networks. Engineering 6, 3 (2020), 264–274.
- cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).
- Fast Fourier Convolution. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 4479–4488. https://proceedings.neurips.cc/paper_files/paper/2020/file/2fd5d41ec6cfab47e32164d5624269b1-Paper.pdf
- Optimal approximate matrix product in terms of stable rank. arXiv preprint arXiv:1507.02268 (2015).
- Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.
- Petros Drineas and Ravi Kannan. 2001. Fast Monte-Carlo algorithms for approximate matrix multiplication. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science. IEEE, 452–459.
- Deepshift: Towards multiplication-less neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2359–2368.
- Deena P Francis and Kumudha Raimond. 2022. A practical streaming approximate matrix multiplication algorithm. Journal of King Saud University-Computer and Information Sciences 34, 1 (2022), 1455–1465.
- Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.
- PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models. In 2023 IEEE High Performance Extreme Computing Conference (HPEC). 1–7. https://doi.org/10.1109/HPEC58863.2023.10363610
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.
- Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117–128.
- Performance analysis of CNN frameworks for GPUs. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 55–64.
- The effects of approximate multiplication on convolutional neural networks. IEEE Transactions on Emerging Topics in Computing 10, 2 (2021), 904–916.
- Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation. In Proceedings of the 54th Annual Design Automation Conference 2017. 1–6.
- CIFAR-10 (Canadian Institute for Advanced Research). ([n. d.]). http://www.cs.toronto.edu/~kriz/cifar.html
- CIFAR-100 (Canadian Institute for Advanced Research). ([n. d.]). http://www.cs.toronto.edu/~kriz/cifar.html
- ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
- Andrew Lavin and Scott Gray. 2016. Fast algorithms for convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4013–4021.
- Efficient distributed algorithms for Convolutional Neural Networks. CoRR abs/2105.13480 (2021). arXiv:2105.13480 https://arxiv.org/abs/2105.13480
- Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403.
- Network In Network. arXiv:1312.4400 [cs.NE]
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431–3440.
- Avner Magen and Anastasios Zouzias. 2011. Low rank matrix-valued Chernoff bounds and approximate matrix multiplication. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. SIAM, 1422–1436.
- Co-occurring directions sketching for approximate matrix multiply. In Artificial Intelligence and Statistics. PMLR, 567–575.
- Can FPGAs beat GPUs in accelerating next-generation deep neural networks?. In Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. 5–14.
- A reconfigurable fabric for accelerating large-scale datacenter services. IEEE Micro 35, 3 (2015), 10–22.
- Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 24–34. https://doi.org/10.1109/IISWC47752.2019.9042000
- Optimising convolutional neural networks inference on low-powered GPUs. (2019).
- Investigating data representation for efficient and reliable Convolutional Neural Networks. Microprocessors and Microsystems 86 (08 2021). https://doi.org/10.1016/j.micpro.2021.104318
- Imagenet large scale visual recognition challenge. International journal of computer vision 115 (2015), 211–252.
- Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions. 46–55. https://doi.org/10.1109/IPDPSW52791.2021.00016
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]
- CNN-Oriented Placement Algorithm for High-Performance Accelerators on Rad-Hard FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2023), 1–13. https://doi.org/10.1109/TCAD.2023.3331976
- Fixing the train-test resolution discrepancy. Advances in neural information processing systems 32 (2019).
- Parallel multi channel convolution using general matrix multiplication. In 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP). IEEE, 19–24.
- Frequent direction algorithms for approximate matrix multiplication with applications in CCA. computational complexity 1, m3 (2016), 2.
- Product quantization network for fast image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV). 186–201.
- Object-contextual representations for semantic segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, 173–190.
- Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching. arXiv:2401.06362 [cs.NE]
- Fine-grained address segmentation for attention-based variable-degree prefetching. In Proceedings of the 19th ACM International Conference on Computing Frontiers (Turin, Italy) (CF ’22). Association for Computing Machinery, New York, NY, USA, 103–112. https://doi.org/10.1145/3528416.3530236