FSAI Preconditioner for Sparse Linear Systems

Updated 1 January 2026

The FSAI preconditioner is a matrix preconditioning method that factors an approximate inverse into sparse matrices to boost convergence of iterative solvers.
It employs Frobenius norm minimization with prescribed sparsity patterns and QR/SVD-based algorithms, ensuring efficient and parallelizable construction.
The technique scales well on modern GPU/CPU architectures, reducing iteration counts while maintaining stability across various application domains.

The factorized sparse approximate inverse (FSAI) preconditioner is a matrix preconditioning technique designed to efficiently accelerate the convergence of iterative solvers for large, sparse linear systems. Rather than approximating the system matrix's inverse with a single sparse matrix, FSAI decomposes the inverse into sparse matrix factors, customized for parallel architectures and adaptable to numerous algebraic and application settings. FSAI methods minimize the Frobenius norm of the difference between a transformed product and the identity matrix, subject to user-prescribed sparsity patterns, yielding robust preconditioners that can be tailored to both matrix structure and computational constraints.

1. Algebraic Formulation and Minimization Principles

The generic FSAI approach seeks factors $W$ and $V$ such that the preconditioner $M = W V^{-1}$ closely approximates $A^{-1}$ for a nonsingular system $Ax = b$ , with $A \in \mathbb{C}^{n \times n}$ . The matrices $W$ and $V$ are constrained to "standard" sparse subspaces $\mathcal{W}$ and $\mathcal{V}$ , ensuring that $V^{-1}$ is inexpensive to apply (often triangular or block-diagonal). The core minimization principle is:

$\min_{W \in \mathcal{W}, V \in \mathcal{V}} \|A W - V\|_F$

After $W$ and $V$ are constructed, $M = W V^{-1}$ is used to right-precondition the original system:

$A (W V^{-1}) y = b, \quad x = V^{-1} y$

This Frobenius-norm minimization decouples across columns, allowing full parallelization in the setup phase. The selection of sparsity patterns for $W$ and $V$ is critical; heuristics include Neumann series expansions, powers of adjacency graphs, or patterns determined by the structure of $A$ and its blocks. For block systems, as in partition-of-unity or mortar-type coupled problems, FSAI may be applied to subblocks with independent patterns (Byckling et al., 2012, Firmbach et al., 2024, Recio et al., 8 Sep 2025).

2. Construction Algorithms and Pattern Selection

FSAI preconditioner construction involves solving small, independent least-squares or singular value problems for each column (or block row), with algorithms broadly falling into two classes:

Algorithm 1 (QR-based, unit-norm columns in $V$ ):

For each column $j$ , form submatrix $A_j$ from sparsity pattern of $w_j$
Compute sparse QR decomposition $A_j = Q_j R_j$
Extract columns of $Q_j^*$ as allowed by $v_j$ pattern; $v_j$ is the dominant right singular vector of $M_j$
Solve least-squares $w_j$ to fit $A_j w_j \approx v_j$
Assemble $W$ and $V$

Algorithm 2 (SVD-based, unit-norm columns in $W$ ):

For column $j$ , form $A_j$ and exclude rows by $v_j$ 's nonzero locations, yielding $\tilde{A}_j$
Perform sparse QR and SVD on $\tilde{A}_j$ or its factor; $w_j$ is taken as the right singular vector for the smallest singular value
Project $A W$ onto pattern $\mathcal{V}$ to obtain $V$

Sparsity patterns for $W$ and $V$ are chosen via graph enrichment (powers of adjacency graphs), level-of-fill controls, and numerical dropping techniques. Adaptive approaches (aFSAI) incrementally build row patterns by adding positions that most effectively reduce the Frobenius norm or related quality metrics such as the Kaporin number $\kappa(G) = \frac{\operatorname{tr}(GAG^\top)/n}{\det(GAG^\top)^{1/n}}$ (Isotton et al., 2020, Recio et al., 8 Sep 2025).

Numerical dropping is governed by column-dependent thresholds derived to ensure stability of the residual and nonsingularity, e.g., $\text{tol}_k = \varepsilon/(nnz(g_k)\,\|A\|_1)$ for the $k^\text{th}$ column $g_k$ (Jia et al., 2012).

3. Parallelization, Implementation, and Scalability

All FSAI construction algorithms (both static and adaptive) are inherently parallel, as each column or block can be processed independently. This property has led to highly scalable implementations on distributed-memory clusters and many-core GPU systems:

Each processor or GPU handles its subset of rows or block-rows
Graph enrichments and sparsity pattern updates are local; communication is minimal and limited to adjacency structures and block updates
Dense systems for each row are solved with local QR or Cholesky; in GPU settings, batched dense solvers are utilized for small matrix systems
Sparse matrix-vector application ("miniwarps") is tailored to hardware constraints, maximizing occupancy and bandwidth
Global parallel efficiency is documented at $\geq 50\%$ up to hundreds of GPUs and thousands of MPI ranks for fill levels $nnz/\text{row} \approx 80$ and large problem sizes ( $\sim 100$ million unknowns), with setup and solve times scaling nearly ideally (Isotton et al., 2020, Firmbach et al., 2024)

Pattern selection and fill controls are essential to maintain bounded memory and computational complexity. Adaptive incrementing (e.g., adding 2–5 new nonzeros per step up to $k_{max}$ steps) typically yields robust preconditioners with apply complexity proportional to total nonzeros in $G$ (Isotton et al., 2020).

4. Extensions to Advanced System Structures and Problem Classes

FSAI preconditioning generalizes to mixed-dimensional block systems, singular M-matrices, and higher-order PDE discretizations:

Block-coupled systems:

Approximate block factorization preconditioners utilize FSAI for ill-conditioned sub-blocks (e.g., beam block $A_{11}$ in beam-solid interaction), providing efficient approximations to both block inverses and Schur complements. Static graph enrichment of the sparsity pattern is performed by powers of adjacency graphs, and post-filtering controls fill via thresholds. Weak scaling is demonstrated up to 1000 MPI ranks in large-scale civil engineering FSI applications (Firmbach et al., 2024).

Singular M-matrices:

To guarantee stability and well-posedness for singular, irreducible M-matrices, certain off-diagonal couplings are forbidden in the lower-/upper-triangular sparsity patterns. This ensures existence, uniqueness, non-negativity, and nonsingularity of FSAI factors ( $L_G$ , $U_G$ ) and preconditioned matrices, which preserve the M-matrix properties (nonpositive off-diagonal, strict positivity on the diagonal). The FSAI proceeds via solving principal submatrix systems for each row/column (Bick et al., 25 Dec 2025).

Higher-order problems and block-adaptive FSAI:

Multilevel FSAI construction for partition-of-unity and high-order PDE discretizations leverages block-partitioning and adaptive enrichment within V-cycle solvers. Nested FSAI steps allow for multi-level density control, while block-wise minimization of the Kaporin determinant ratio ensures algebraic robustness in the preconditioner (Recio et al., 8 Sep 2025).

5. Practical Tuning, Dropping Criteria, and Robustness

Practical effectiveness of FSAI preconditioners depends on careful control of sparsity and numerical dropping. Static and adaptive dropping criteria based on matrix norm bounds, column densities, and target residuals prevent breakdowns due to ill-posedness or over-dropping:

Static dropping after full least-squares solution: $|g_{jk}| < \text{tol}_k$ with $\text{tol}_k = \varepsilon/(nnz(g_k)\|A\|_1)$ . Proven to maintain nonsingularity and double the maximum allowable residual (Jia et al., 2012).
Adaptive dropping within iterative schemes: drop small entries on each adaptive step, updating tolerance dynamically based on current column sparsity and residual norm.
Graph enrichment via powers of the adjacency matrix allows controlled expansion of fill for better accuracy, with subsequent post-filtering to eliminate negligible entries.

Fill parameters (maximum fill per row, drop tolerance, relative residual thresholds $\varepsilon$ ) are user-tuned: $\varepsilon \sim 10^{-2}$ to $10^{-3}$ , $k_{max}$ steps to $k_i \sim 30$ –$50$ per row/column for robust quality at manageable compute cost.

6. Numerical Performance, Applications, and Comparative Analysis

FSAI preconditioners demonstrate competitive or superior performance compared to traditional techniques (Jacobi, ILU, classical sparse approximate inverses) in diverse benchmark settings:

Direct QR/SVD-based FSAI construction yields predictable setup cost and avoids reliance on $A^*$ or user-tuned parameters, outperforming iterative power methods for challenging nonsymmetric problems (Byckling et al., 2012).
Adaptive and block-FSAI implementations in multilevel solvers (with Chebyshev-4 smoothing) provide significant improvement in energy-norm convergence rates for biharmonic/triharmonic PDEs and capture anisotropies in local mesh structure (Recio et al., 8 Sep 2025).
In block-coupled mixed-dimensional FSI systems, FSAI-based preconditioners, when coupled with AMG correction on Schur complements, reduce iteration counts and setup costs, with documented scalability to 1000 MPI ranks (Firmbach et al., 2024).
For singular M-matrices, numerical tests on Markov-chain problems and graph Laplacians validate the stability, nonsingularity, and spectral clustering efficacy of the tailored FSAI construction (Bick et al., 25 Dec 2025).
GPU-accelerated aFSAI achieves near-ideal scalability and throughput, with preconditioner application times approaching device-level SpMV bandwidths ( $\sim 100$ GF/s), while reducing Krylov iterations by factors of 5–20 (Isotton et al., 2020).
Static and adaptive dropping substantially reduce preconditioner density and application time with negligible loss in iterative convergence, as evidenced by test matrices drawn from the UF Sparse Matrix Collection (Jia et al., 2012).

Application Area	System Type	FSAI Role
Unstructured linear systems	SPD/nonsymmetric $A$	Preconditioner for Krylov solvers
Mixed-dimensional FSI	Block-coupled	Schur approx., block smoother
Markov chains, Laplacians	Singular M-matrix	Pattern-adaptive inverse
Higher-order PDEs, PUM	Block-partitioned SPD	Multilevel smoother, V-cycle
Large-scale simulation (GPU/CPU)	Distributed storage	Highly-parallel preconditioner

7. Theoretical Properties and Guarantees

Minimization in Frobenius norm or Kaporin metric clusters spectra of the preconditioned matrix near unity, improving convergence rates, provided factor conditioning remains moderate. For any pair $W$ , $V$ ,

$\frac{\|A W V^{-1} - I\|}{\|V^{-1}\|} \leq \|A W - V\| \leq \|A W V^{-1} - I\| \cdot \|V\|$

Dimensionality of the FSAI map is maximized for non-overlapping diagonal patterns, and scale invariance is preserved by column normalization. Structured block patterns and permutation strategies (Dulmage–Mendelsohn, strongly connected components) augment numerical robustness in complex coupled problems.

Special pattern restrictions for singular M-matrices guarantee unique, non-negative, nonsingular FSAI factors and preserve structural properties of the original matrix through the preconditioned system. Computed (1,2)-inverses for maximal pattern maintain Penrose conditions for algebraic consistency (Bick et al., 25 Dec 2025).