Fast hardware-aware matrix-free algorithm for higher-order finite-element discretized matrix multivector products on distributed systems

Published 15 Aug 2022 in physics.comp-ph | (2208.07129v5)

Abstract: Recent hardware-aware matrix-free algorithms for higher-order finite-element (FE) discretized matrix-vector multiplications reduce floating point operations and data access costs compared to traditional sparse matrix approaches. This work proposes efficient matrix-free algorithms for evaluating FE discretized matrix-multivector products on both multi-node CPU and GPU architectures. We address a critical gap in existing matrix-free implementations, which are well suited only for the action of FE discretized matrices on a single vector. We employ batched evaluation strategies, with the batchsize tailored to underlying hardware architectures, leading to better data locality and enabling further parallelization. On CPUs, we utilize even-odd decomposition, SIMD vectorization, and overlapping computation and communication strategies. On GPUs, we employ strategies to overlap compute and data movement in conjunction with GPU shared memory, constant memory, and kernel fusion to reduce data accesses. Our implementation outperforms the baselines for Helmholtz operator action, achieving up to 1.4x improvement on one CPU node and up to 2.8x on one GPU node, while reaching up to 4.4x and 1.5x improvement on multiple nodes for CPUs ($\sim 3000$ cores) and GPUs ($\sim$ 25 GPUs), respectively. We further benchmark the performance of the proposed implementation for solving a model eigenvalue problem for 1024 smallest eigenvalue-eigenvector pairs by employing the Chebyshev Filtered Subspace Iteration method, achieving up to 1.5x improvement on one CPU node and up to 2.2x on one GPU node while reaching up to 3.0x and 1.4x improvement on multinode CPUs ($\sim 3000$ cores) and GPUs ($\sim$ 25 GPUs), respectively.

Abstract PDF Upgrade to Chat

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Fast hardware-aware matrix-free algorithm for higher-order finite-element discretized matrix multivector products on distributed systems

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (4)

Collections

Fast hardware-aware matrix-free algorithm for higher-order finite-element discretized matrix multivector products on distributed systems

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections