Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributed-memory $\mathcal{H}$-matrix Algebra I: Data Distribution and Matrix-vector Multiplication (2008.12441v2)

Published 28 Aug 2020 in math.NA, cs.DC, and cs.NA

Abstract: We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $\Omega(P2)$ scheduling procedure used in previous work, where $P$ is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among $P$ processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is $O\Big(\frac{N \log N}{P} + \alpha \log P + \beta \log2 P \Big)$ for $\mathcal{H}$-matrices under weak admissibility condition, where $N$ is the matrix size, $\alpha$ denotes the latency, and $\beta$ denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of various sizes among various numbers of processes. On thousands of processes, good parallel efficiency is still observed.

Citations (2)

Summary

We haven't generated a summary for this paper yet.