2000 character limit reached
LRAMM -- Low precision approximates GEMM via RSVD (2405.16917v1)
Published 27 May 2024 in math.NA, cs.NA, and cs.PF
Abstract: Matrix multiplication computation acceleration has been a research hotspot across various domains. Due to the characteristics of some applications, approximate matrix multiplication can achieve significant performance improvements without losing much precision. In this paper, we propose LRAMM - a high-performance matrix multiplication approximation algorithm that combines mixed-precision quantized matrix multiplication with RSVD techniques, further enhancing efficiency within the error range of low-precision matrix multiplication by utilizing matrix low-rank decomposition technology.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- D. Coppersmith and S. Winograd, “Matrix multiplication via arithmetic progressions,” in Proceedings of the nineteenth annual ACM symposium on Theory of computing, 1987, pp. 1–6.
- F. Le Gall, “Powers of tensors and fast matrix multiplication,” in Proceedings of the 39th international symposium on symbolic and algebraic computation, 2014, pp. 296–303.
- N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th annual international symposium on computer architecture, 2017, pp. 1–12.
- B. Hickmann, J. Chen, M. Rotzin, A. Yang, M. Urbanski, and S. Avancha, “Intel nervana neural network processor-t (nnp-t) fused floating point many-term dot product,” in 2020 IEEE 27th Symposium on Computer Arithmetic (ARITH). IEEE, 2020, pp. 133–136.
- A. Boutros, E. Nurvitadhi, R. Ma, S. Gribok, Z. Zhao, J. C. Hoe, V. Betz, and M. Langhammer, “Beyond peak performance: Comparing the real performance of ai-optimized fpgas and gpus,” in 2020 international conference on field-programmable technology (ICFPT). IEEE, 2020, pp. 10–19.
- J. Choquette, W. Gandhi, O. Giroux, N. Stam, and R. Krashinsky, “Nvidia a100 tensor core gpu: Performance and innovation,” IEEE Micro, vol. 41, no. 2, pp. 29–35, 2021.
- R. Spring and A. Shrivastava, “Scalable and sustainable deep learning via randomized hashing,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 445–454.
- B. Chen, T. Medini, J. Farwell, C. Tai, A. Shrivastava et al., “Slide: In defense of smart algorithms over hardware acceleration for large-scale deep learning systems,” Proceedings of Machine Learning and Systems, vol. 2, pp. 291–306, 2020.
- D. Blalock and J. Guttag, “Multiplying matrices without multiplying,” in International Conference on Machine Learning. PMLR, 2021, pp. 992–1004.
- S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic mri exploiting sparsity and low-rank structure: k-t slr,” IEEE Transactions on Medical Imaging, p. 1042–1054, May 2011. [Online]. Available: http://dx.doi.org/10.1109/tmi.2010.2100850
- A. Aghajanyan, S. Gupta, and L. Zettlemoyer, “Intrinsic dimensionality explains the effectiveness of language model fine-tuning,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Jan 2021. [Online]. Available: http://dx.doi.org/10.18653/v1/2021.acl-long.568
- R. Wang, D. Tang, N. Duan, Z. Wei, X. Huang, J. Ji, G. Cao, D. Jiang, and M. Zhou, “K-adapter: Infusing knowledge into pre-trained models with adapters,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Jan 2021. [Online]. Available: http://dx.doi.org/10.18653/v1/2021.findings-acl.121
- Y. Idelbayev and M. A. Carreira-Perpinan, “Low-rank compression of neural nets: Learning the rank of each layer,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2020. [Online]. Available: http://dx.doi.org/10.1109/cvpr42600.2020.00807
- K. Osawa, A. Sekiya, H. Naganuma, and R. Yokota, “Accelerating matrix multiplication in deep learning by using low-rank approximation,” in 2017 International Conference on High Performance Computing amp; Simulation (HPCS), Jul 2017. [Online]. Available: http://dx.doi.org/10.1109/hpcs.2017.37
- P. Drineas, R. Kannan, and M. W. Mahoney, “Fast monte carlo algorithms for matrices i: Approximating matrix multiplication *,” SIAM Journal on Computing, p. 132–157, Jan 2006. [Online]. Available: http://dx.doi.org/10.1137/s0097539704442684
- Y. Yang and J. Rao, “Robust and efficient harmonics denoising in large dataset based on random svd and soft thresholding,” Ieee Access, vol. 7, pp. 77 607–77 617, 2019.
- H. Ji, W. Yu, and Y. Li, “A rank revealing randomized singular value decomposition (r3svd) algorithm for low-rank matrix approximations,” arXiv preprint arXiv:1605.08134, 2016.
- P. Drineas, R. Kannan, and M. W. Mahoney, “Fast monte carlo algorithms for matrices ii: Computing a low-rank approximation to a matrix,” SIAM Journal on computing, vol. 36, no. 1, pp. 158–183, 2006.
- S. Eriksson-Bique, M. Solbrig, M. Stefanelli, S. Warkentin, R. Abbey, and I. C. Ipsen, “Importance sampling for a monte carlo matrix multiplication algorithm, with application to information retrieval,” SIAM Journal on Scientific Computing, vol. 33, no. 4, pp. 1689–1706, 2011.
- W. H. Equitz, “A new vector quantization clustering algorithm,” IEEE transactions on acoustics, speech, and signal processing, vol. 37, no. 10, pp. 1568–1575, 1989.
- S. Dai, R. Venkatesan, M. Ren, B. Zimmer, W. Dally, and B. Khailany, “Vs-quant: Per-vector scaled quantization for accurate low-precision neural network inference,” Proceedings of Machine Learning and Systems, vol. 3, pp. 873–884, 2021.
- N. J. Higham, S. Pranesh, and M. Zounon, “Squeezing a matrix into half precision, with an application to solving linear systems,” SIAM journal on scientific computing, vol. 41, no. 4, pp. A2536–A2551, 2019.
- A. Adler, J. Tang, and Y. Polyanskiy, “Quantization of random distributions under kl divergence,” in 2021 IEEE International Symposium on Information Theory (ISIT). IEEE, 2021, pp. 2762–2767.
- J. C. S. de Souza, T. M. L. Assis, and B. C. Pal, “Data compression in smart distribution systems via singular value decomposition,” IEEE transactions on smart grid, vol. 8, no. 1, pp. 275–284, 2015.
- F. Anowar, S. Sadaoui, and B. Selim, “Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne),” Computer Science Review, vol. 40, p. 100378, 2021.
- J. S. Paul, M. R. Reddy, and V. J. Kumar, “A transform domain svd filter for suppression of muscle noise artefacts in exercise ecg’s,” IEEE Transactions on Biomedical Engineering, vol. 47, no. 5, pp. 654–663, 2000.
- X. Zhou, C. Yang, H. Zhao, and W. Yu, “Low-rank modeling and its applications in image analysis,” ACM Computing Surveys (CSUR), vol. 47, no. 2, pp. 1–33, 2014.
- S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in International conference on machine learning. PMLR, 2015, pp. 1737–1746.
- K. Chellapilla, S. Puri, and P. Simard, “High performance convolutional neural networks for document processing,” in Tenth international workshop on frontiers in handwriting recognition. Suvisoft, 2006.
- S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, “cudnn: Efficient primitives for deep learning,” arXiv preprint arXiv:1410.0759, 2014.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- N. Halko, P.-G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” SIAM review, vol. 53, no. 2, pp. 217–288, 2011.
- R. Saha, V. Srivastava, and M. Pilanci, “Matrix compression via randomized low rank and low precision factorization,” Advances in Neural Information Processing Systems, vol. 36, 2023.
- NVIDIA. (2017) NVIDIA tensor cores. [Online]. Available: https: //www.nvidia.com/en-us/data-center/tensorcore/
- (2020) [Online]. Available: https://images.nvidia .com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
- NVIDIA. (2020) [Online]. Available: https://images.nvidia.com/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf
- NVIDIA. (2020) [Online]. Available: https://images.nvidia. com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
- https://arxiv.org/abs/1502.05366