Efficient Distributed-Memory Parallel Matrix-Vector Multiplication with Wide or Tall Unstructured Sparse Matrices
Abstract: This paper presents an efficient technique for matrix-vector and vector-transpose-matrix multiplication in distributed-memory parallel computing environments, where the matrices are unstructured, sparse, and have a substantially larger number of columns than rows or vice versa. Our method allows for parallel I/O, does not require extensive preprocessing, and has the same communication complexity as matrix-vector multiplies with column or row partitioning. Our implementation of the method uses MPI. We partition the matrix by individual nonzero elements, rather than by row or column, and use an "overlapped" vector representation that is matched to the matrix. The transpose multiplies use matrix-specific MPI communicators and reductions that we show can be set up in an efficient manner. The proposed technique achieves a good work per processor balance even if some of the columns are dense, while keeping communication costs relatively low.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.