Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

LRAMM -- Low precision approximates GEMM via RSVD (2405.16917v1)

Published 27 May 2024 in math.NA, cs.NA, and cs.PF

Abstract: Matrix multiplication computation acceleration has been a research hotspot across various domains. Due to the characteristics of some applications, approximate matrix multiplication can achieve significant performance improvements without losing much precision. In this paper, we propose LRAMM - a high-performance matrix multiplication approximation algorithm that combines mixed-precision quantized matrix multiplication with RSVD techniques, further enhancing efficiency within the error range of low-precision matrix multiplication by utilizing matrix low-rank decomposition technology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  2. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
  3. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  4. D. Coppersmith and S. Winograd, “Matrix multiplication via arithmetic progressions,” in Proceedings of the nineteenth annual ACM symposium on Theory of computing, 1987, pp. 1–6.
  5. F. Le Gall, “Powers of tensors and fast matrix multiplication,” in Proceedings of the 39th international symposium on symbolic and algebraic computation, 2014, pp. 296–303.
  6. N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th annual international symposium on computer architecture, 2017, pp. 1–12.
  7. B. Hickmann, J. Chen, M. Rotzin, A. Yang, M. Urbanski, and S. Avancha, “Intel nervana neural network processor-t (nnp-t) fused floating point many-term dot product,” in 2020 IEEE 27th Symposium on Computer Arithmetic (ARITH).   IEEE, 2020, pp. 133–136.
  8. A. Boutros, E. Nurvitadhi, R. Ma, S. Gribok, Z. Zhao, J. C. Hoe, V. Betz, and M. Langhammer, “Beyond peak performance: Comparing the real performance of ai-optimized fpgas and gpus,” in 2020 international conference on field-programmable technology (ICFPT).   IEEE, 2020, pp. 10–19.
  9. J. Choquette, W. Gandhi, O. Giroux, N. Stam, and R. Krashinsky, “Nvidia a100 tensor core gpu: Performance and innovation,” IEEE Micro, vol. 41, no. 2, pp. 29–35, 2021.
  10. R. Spring and A. Shrivastava, “Scalable and sustainable deep learning via randomized hashing,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 445–454.
  11. B. Chen, T. Medini, J. Farwell, C. Tai, A. Shrivastava et al., “Slide: In defense of smart algorithms over hardware acceleration for large-scale deep learning systems,” Proceedings of Machine Learning and Systems, vol. 2, pp. 291–306, 2020.
  12. D. Blalock and J. Guttag, “Multiplying matrices without multiplying,” in International Conference on Machine Learning.   PMLR, 2021, pp. 992–1004.
  13. S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic mri exploiting sparsity and low-rank structure: k-t slr,” IEEE Transactions on Medical Imaging, p. 1042–1054, May 2011. [Online]. Available: http://dx.doi.org/10.1109/tmi.2010.2100850
  14. A. Aghajanyan, S. Gupta, and L. Zettlemoyer, “Intrinsic dimensionality explains the effectiveness of language model fine-tuning,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Jan 2021. [Online]. Available: http://dx.doi.org/10.18653/v1/2021.acl-long.568
  15. R. Wang, D. Tang, N. Duan, Z. Wei, X. Huang, J. Ji, G. Cao, D. Jiang, and M. Zhou, “K-adapter: Infusing knowledge into pre-trained models with adapters,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Jan 2021. [Online]. Available: http://dx.doi.org/10.18653/v1/2021.findings-acl.121
  16. Y. Idelbayev and M. A. Carreira-Perpinan, “Low-rank compression of neural nets: Learning the rank of each layer,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2020. [Online]. Available: http://dx.doi.org/10.1109/cvpr42600.2020.00807
  17. K. Osawa, A. Sekiya, H. Naganuma, and R. Yokota, “Accelerating matrix multiplication in deep learning by using low-rank approximation,” in 2017 International Conference on High Performance Computing amp; Simulation (HPCS), Jul 2017. [Online]. Available: http://dx.doi.org/10.1109/hpcs.2017.37
  18. P. Drineas, R. Kannan, and M. W. Mahoney, “Fast monte carlo algorithms for matrices i: Approximating matrix multiplication *,” SIAM Journal on Computing, p. 132–157, Jan 2006. [Online]. Available: http://dx.doi.org/10.1137/s0097539704442684
  19. Y. Yang and J. Rao, “Robust and efficient harmonics denoising in large dataset based on random svd and soft thresholding,” Ieee Access, vol. 7, pp. 77 607–77 617, 2019.
  20. H. Ji, W. Yu, and Y. Li, “A rank revealing randomized singular value decomposition (r3svd) algorithm for low-rank matrix approximations,” arXiv preprint arXiv:1605.08134, 2016.
  21. P. Drineas, R. Kannan, and M. W. Mahoney, “Fast monte carlo algorithms for matrices ii: Computing a low-rank approximation to a matrix,” SIAM Journal on computing, vol. 36, no. 1, pp. 158–183, 2006.
  22. S. Eriksson-Bique, M. Solbrig, M. Stefanelli, S. Warkentin, R. Abbey, and I. C. Ipsen, “Importance sampling for a monte carlo matrix multiplication algorithm, with application to information retrieval,” SIAM Journal on Scientific Computing, vol. 33, no. 4, pp. 1689–1706, 2011.
  23. W. H. Equitz, “A new vector quantization clustering algorithm,” IEEE transactions on acoustics, speech, and signal processing, vol. 37, no. 10, pp. 1568–1575, 1989.
  24. S. Dai, R. Venkatesan, M. Ren, B. Zimmer, W. Dally, and B. Khailany, “Vs-quant: Per-vector scaled quantization for accurate low-precision neural network inference,” Proceedings of Machine Learning and Systems, vol. 3, pp. 873–884, 2021.
  25. N. J. Higham, S. Pranesh, and M. Zounon, “Squeezing a matrix into half precision, with an application to solving linear systems,” SIAM journal on scientific computing, vol. 41, no. 4, pp. A2536–A2551, 2019.
  26. A. Adler, J. Tang, and Y. Polyanskiy, “Quantization of random distributions under kl divergence,” in 2021 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2021, pp. 2762–2767.
  27. J. C. S. de Souza, T. M. L. Assis, and B. C. Pal, “Data compression in smart distribution systems via singular value decomposition,” IEEE transactions on smart grid, vol. 8, no. 1, pp. 275–284, 2015.
  28. F. Anowar, S. Sadaoui, and B. Selim, “Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne),” Computer Science Review, vol. 40, p. 100378, 2021.
  29. J. S. Paul, M. R. Reddy, and V. J. Kumar, “A transform domain svd filter for suppression of muscle noise artefacts in exercise ecg’s,” IEEE Transactions on Biomedical Engineering, vol. 47, no. 5, pp. 654–663, 2000.
  30. X. Zhou, C. Yang, H. Zhao, and W. Yu, “Low-rank modeling and its applications in image analysis,” ACM Computing Surveys (CSUR), vol. 47, no. 2, pp. 1–33, 2014.
  31. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in International conference on machine learning.   PMLR, 2015, pp. 1737–1746.
  32. K. Chellapilla, S. Puri, and P. Simard, “High performance convolutional neural networks for document processing,” in Tenth international workshop on frontiers in handwriting recognition.   Suvisoft, 2006.
  33. S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, “cudnn: Efficient primitives for deep learning,” arXiv preprint arXiv:1410.0759, 2014.
  34. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.   Springer, 2014, pp. 740–755.
  35. N. Halko, P.-G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” SIAM review, vol. 53, no. 2, pp. 217–288, 2011.
  36. R. Saha, V. Srivastava, and M. Pilanci, “Matrix compression via randomized low rank and low precision factorization,” Advances in Neural Information Processing Systems, vol. 36, 2023.
  37. NVIDIA. (2017) NVIDIA tensor cores. [Online]. Available: https: //www.nvidia.com/en-us/data-center/tensorcore/
  38. (2020) [Online]. Available: https://images.nvidia .com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
  39. NVIDIA. (2020) [Online]. Available: https://images.nvidia.com/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf
  40. NVIDIA. (2020) [Online]. Available: https://images.nvidia. com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
  41. https://arxiv.org/abs/1502.05366

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.