Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TabConv: Low-Computation CNN Inference via Table Lookups (2404.05872v1)

Published 8 Apr 2024 in cs.CV, cs.LG, and cs.NE

Abstract: Convolutional Neural Networks (CNNs) have demonstrated remarkable ability throughout the field of computer vision. However, CNN inference requires a large number of arithmetic operations, making them expensive to deploy in hardware. Current approaches alleviate this issue by developing hardware-supported, algorithmic processes to simplify spatial convolution functions. However, these methods still heavily rely on matrix multiplication, leading to significant computational overhead. To bridge the gap between hardware, algorithmic acceleration, and approximate matrix multiplication, we propose TabConv, a novel, table-based approximation for convolution to significantly reduce arithmetic operations during inference. Additionally, we introduce a priority masking technique based on cosine similarity to select layers for table-based approximation, thereby maintaining the model performance. We evaluate our approach on popular CNNs: ResNet-18, ResNet-34, and NetworkInNetwork (NIN). TabConv preserves over 93% of the original model's performance while reducing arithmetic operations by 36.5%, 25.8%, and 99.4% for ResNet-18 on CIFAR-10, CIFAR-100, and MNIST, respectively, 35.6% and 99.3% for ResNet-34 on CIFAR-10 and MNIST, and 98.9% for NIN on MNIST, achieving low-computation inference.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Accelerating CNN inference on FPGAs: A Survey. arXiv:1806.01683 [cs.DC]
  2. Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018).
  3. Davis Blalock and John Guttag. 2021. Multiplying matrices without multiplying. In International Conference on Machine Learning. PMLR, 992–1004.
  4. High performance convolutional neural networks for document processing. In Tenth international workshop on frontiers in handwriting recognition. Suvisoft.
  5. A survey of accelerator architectures for deep neural networks. Engineering 6, 3 (2020), 264–274.
  6. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).
  7. Fast Fourier Convolution. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 4479–4488. https://proceedings.neurips.cc/paper_files/paper/2020/file/2fd5d41ec6cfab47e32164d5624269b1-Paper.pdf
  8. Optimal approximate matrix product in terms of stable rank. arXiv preprint arXiv:1507.02268 (2015).
  9. Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.
  10. Petros Drineas and Ravi Kannan. 2001. Fast Monte-Carlo algorithms for approximate matrix multiplication. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science. IEEE, 452–459.
  11. Deepshift: Towards multiplication-less neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2359–2368.
  12. Deena P Francis and Kumudha Raimond. 2022. A practical streaming approximate matrix multiplication algorithm. Journal of King Saud University-Computer and Information Sciences 34, 1 (2022), 1455–1465.
  13. Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.
  14. PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models. In 2023 IEEE High Performance Extreme Computing Conference (HPEC). 1–7. https://doi.org/10.1109/HPEC58863.2023.10363610
  15. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
  16. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  17. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
  18. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.
  19. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117–128.
  20. Performance analysis of CNN frameworks for GPUs. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 55–64.
  21. The effects of approximate multiplication on convolutional neural networks. IEEE Transactions on Emerging Topics in Computing 10, 2 (2021), 904–916.
  22. Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation. In Proceedings of the 54th Annual Design Automation Conference 2017. 1–6.
  23. CIFAR-10 (Canadian Institute for Advanced Research). ([n. d.]). http://www.cs.toronto.edu/~kriz/cifar.html
  24. CIFAR-100 (Canadian Institute for Advanced Research). ([n. d.]). http://www.cs.toronto.edu/~kriz/cifar.html
  25. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
  26. Andrew Lavin and Scott Gray. 2016. Fast algorithms for convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4013–4021.
  27. Efficient distributed algorithms for Convolutional Neural Networks. CoRR abs/2105.13480 (2021). arXiv:2105.13480 https://arxiv.org/abs/2105.13480
  28. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403.
  29. Network In Network. arXiv:1312.4400 [cs.NE]
  30. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431–3440.
  31. Avner Magen and Anastasios Zouzias. 2011. Low rank matrix-valued Chernoff bounds and approximate matrix multiplication. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. SIAM, 1422–1436.
  32. Co-occurring directions sketching for approximate matrix multiply. In Artificial Intelligence and Statistics. PMLR, 567–575.
  33. Can FPGAs beat GPUs in accelerating next-generation deep neural networks?. In Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. 5–14.
  34. A reconfigurable fabric for accelerating large-scale datacenter services. IEEE Micro 35, 3 (2015), 10–22.
  35. Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 24–34. https://doi.org/10.1109/IISWC47752.2019.9042000
  36. Optimising convolutional neural networks inference on low-powered GPUs. (2019).
  37. Investigating data representation for efficient and reliable Convolutional Neural Networks. Microprocessors and Microsystems 86 (08 2021). https://doi.org/10.1016/j.micpro.2021.104318
  38. Imagenet large scale visual recognition challenge. International journal of computer vision 115 (2015), 211–252.
  39. Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions. 46–55. https://doi.org/10.1109/IPDPSW52791.2021.00016
  40. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]
  41. CNN-Oriented Placement Algorithm for High-Performance Accelerators on Rad-Hard FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2023), 1–13. https://doi.org/10.1109/TCAD.2023.3331976
  42. Fixing the train-test resolution discrepancy. Advances in neural information processing systems 32 (2019).
  43. Parallel multi channel convolution using general matrix multiplication. In 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP). IEEE, 19–24.
  44. Frequent direction algorithms for approximate matrix multiplication with applications in CCA. computational complexity 1, m3 (2016), 2.
  45. Product quantization network for fast image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV). 186–201.
  46. Object-contextual representations for semantic segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, 173–190.
  47. Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching. arXiv:2401.06362 [cs.NE]
  48. Fine-grained address segmentation for attention-based variable-degree prefetching. In Proceedings of the 19th ACM International Conference on Computing Frontiers (Turin, Italy) (CF ’22). Association for Computing Machinery, New York, NY, USA, 103–112. https://doi.org/10.1145/3528416.3530236

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com