Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards (1509.07919v1)

Published 25 Sep 2015 in cs.DC, cs.MS, and cs.NA

Abstract: We discuss an approach for solving sparse or dense banded linear systems ${\bf A} {\bf x} = {\bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${\bf A} \in {\mathbb{R}}{N \times N}$ is possibly nonsymmetric and moderately large; i.e., $10000 \leq N \leq 500000$. The ${\it split\ and\ parallelize}$ (${\tt SaP}$) approach seeks to partition the matrix ${\bf A}$ into diagonal sub-blocks ${\bf A}_i$, $i=1,\ldots,P$, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks ${\bf A}_i$. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called ${\tt SaP::GPU}$, which is compared in terms of efficiency with three commonly used sparse direct solvers: ${\tt PARDISO}$, ${\tt SuperLU}$, and ${\tt MUMPS}$. ${\tt SaP::GPU}$, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel's ${\tt MKL}$, ${\tt SaP::GPU}$ also fares well when used to solve dense banded systems that are close to being diagonally dominant. ${\tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.

Citations (6)

Summary

We haven't generated a summary for this paper yet.