Optimal factor storage (sparse vs dense) for explicit GPU assembly in 3D FETI
Determine, for 3D finite element meshes during explicit GPU-based assembly of the local dual operator \tilde{\Lambda}_i = \tilde{B}_i K_{i,reg}^{-1} K_{i,reg}^{-\top} \tilde{B}_i^\top in the FETI solver, whether storing the Cholesky factors of K_{i,reg} in sparse format (cuSPARSE TRSM using CSR/CSC) or in dense format (cuBLAS TRSM using column-major storage) yields superior performance, and delineate the subdomain-size and sparsity regimes under which each choice is optimal.
References
As can be observed in the graph, for 3D meshes where the factors are denser, it is unclear which is the better option.
— Assembly of FETI dual operator using CUDA
(2502.08382 - Homola et al., 12 Feb 2025) in Section “Optimal parameters of the assembly,” Factor storage paragraph (Results)