Fast Kronecker Matrix-Matrix Multiplication on GPUs (2401.10187v3)

Published 18 Jan 2024 in cs.DC

Abstract: Kronecker Matrix-Matrix Multiplication (Kron-Matmul) is the multiplication of a matrix with the Kronecker Product of several smaller matrices. Kron-Matmul is a core operation for many scientific and machine learning computations. State-of-the-art Kron-Matmul implementations utilize existing tensor algebra operations, such as matrix multiplication, transpose, and tensor matrix multiplication. However, this design choice prevents several Kron-Matmul specific optimizations, thus, leaving significant performance on the table. To address this issue, we present FastKron, an efficient technique for Kron-Matmul on single and multiple GPUs. FastKron is independent of linear algebra operations enabling several new optimizations for Kron-Matmul. Thus, it performs up to 40.7x and 7.85x faster than existing implementations on 1 and 16 GPUs respectively.

References (53)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Underfox3/status/1748365689470025970

[2401.10187] Fast Kronecker Matrix-Matrix Multiplication on GPUs (6 points, 0 comments)
[2401.10187] Fast Kronecker Matrix-Matrix Multiplication on GPUs (2 points, 0 comments)

Fast Kronecker Matrix-Matrix Multiplication on GPUs (2401.10187v3)

Summary

Related Papers

Tweets

Reddit