Factorized Preconditioning Architecture

Updated 9 November 2025

Factorized preconditioning architecture is defined as constructing a preconditioner by factorizing it into structured, often sparse, operators (e.g., triangular, block-diagonal) to accelerate convergence in iterative solvers.
It transforms the spectral properties of matrices, enabling rapid convergence via efficient sparse operations such as matrix–vector products and triangular solves in large-scale systems.
Recent innovations integrate neural and quantum methods with classical strategies, enhancing scalability, robustness, and applicability across PDE, kernel methods, and deep learning optimizations.

A factorized preconditioning architecture refers to any preconditioning strategy in which the preconditioner is constructed as a product (factorization) of structured, often sparse, operators—typically triangular, block-diagonal, multilevel, or compositionally layered forms. These architectures underpin a wide array of algorithms for accelerating iterative solutions to large linear systems, partial differential equations (PDEs), kernel methods, and variational quantum eigensolvers. The central motivation is to transform the spectrum of the original operator/promote rapid convergence in Krylov subspace or optimization solvers, while enabling efficient evaluation (often via sparse matrix–vector products or triangular solves) and scalable, parallelizable construction.

1. Mathematical Foundations of Factorized Preconditioning

Classical iterative solvers for linear systems $A x = b$ rely on a preconditioner $M \approx A^{-1}$ that clusters the eigenvalues/singular values of the preconditioned operator $M A$ or $A M$ near unity. Factorized preconditioning refers to constructing $M$ via explicit factorization, i.e., $M = G_1 G_2 \dots G_k$ , where each $G_i$ is chosen to be easily invertible or evaluable.

Traditional examples include:

Incomplete LU (ILU/IC): $A \approx L U$ (LU) or $A \approx L L^T$ (Cholesky for SPD $A$ ), with $L, U$ computed to match the sparsity of $A$ ; $M = L U$ or $L L^T$ used as preconditioner (Hosaka et al., 2023).
Sparse Approximate Inverse, FSAI: $M^{-1} = G^T G$ with $G$ lower-triangular and sparse, constructed by enforcing $G^T G A \approx I$ via local dense solves and scaling (Isotton et al., 2020).
Block and Multilevel Factorizations: Decompose $A$ recursively over blocks or multilevel Schur complements, yielding product form preconditioners such as $M = M_1 M_2 \cdots$ or via hierarchical interpolative factorizations (Feliu-Fabà et al., 2020, Feliu-Fabà et al., 2018).
Data-driven/Neural Factorizations: Neural operators or GNNs produce $M$ via sparse lower/upper triangular neural parameters; “learned” incomplete factorizations reproduce and extend IC/ILU structure in a factorized way (Li et al., 10 Dec 2024, Häusner et al., 2023, Häusner et al., 12 Sep 2024).

The factorized architecture is distinguished from direct inverse approximations by its parameterization and the ability to compose/adapt factors (e.g., by sparsity pattern, block structure, or neural correction), crucial for large-scale or GPU-accelerated computation.

2. Classical and Modern Variants

Several canonical factorized preconditioners have been extensively developed and analyzed:

2.1 Incomplete LU and Cholesky

For a sparse $A$ , the incomplete LU preconditioner $A \approx \tilde L \tilde U$ is built by retaining only those entries of $L, U$ present in a prescribed pattern (typically the pattern of $A$ or a superset). The preconditioned system $M^{-1} A x = M^{-1} b$ is then solved either by left or right application, exploiting fast triangular policies (Hosaka et al., 2023). Iterative solvers reduce from hundreds or thousands of steps (unpreconditioned) to an order of magnitude fewer with a good ILU.

2.2 Adaptive FSAI

FSAI constructs $M^{-1} = G^T G$ such that $G$ is sparse and lower-triangular, and $G^T G A \approx I$ (Isotton et al., 2020). A core component is the adaptive sparsity strategy: the sparsity pattern of $G$ is grown row-by-row, guided by the gradient of the Kaporin diagnostic, until a user-specified error reduction is achieved. Row-wise independently parallelizable, FSAI is particularly well-suited for distributed-memory and GPU acceleration, achieving strong and weak scaling to thousands of GPUs.

2.3 Hierarchical Interpolative Factorization (HIF/PHIF)

Hierarchical factorizations build $A \approx G G^T$ or $A \approx F$ by recursive elimination of well-chosen blocks (local Cholesky or Schur complements), skeletonization (interpolative decomposition), and local preconditioning (block Jacobi or incomplete factorizations at each level). The resulting factorized preconditioner supports $O(N)$ (2D) or $O(N \log N)$ (3D) complexity and yields iteration counts with little or no $N$ dependence, in contrast to Cholesky or classic IC (Feliu-Fabà et al., 2020, Feliu-Fabà et al., 2018).

2.4 Block and Multilevel/Recursive Preconditioners

Recursive multilevel or block-preconditioners (e.g., AMES) partition $A$ into blocks, recursively assemble Schur complements and incomplete factorizations on block/leaf levels, and then combine explicit and implicit approximate inverses across the hierarchy (Bu et al., 2015). Overlap strategies and sparse explicit inverse blocks are key for robustness and reducing iteration counts at fixed memory and computational cost.

3. Data-Driven and Neural Factorized Preconditioners

Recent advances leverage neural networks—especially graph neural networks (GNNs)—to build or refine factorized preconditioners:

3.1 GNN-Enhanced IC/ILU

GNNs are trained to predict either corrections to classical IC factors (“delta” L) or directly learn the triangular or block factors. Architectures mirror the underlying triangular solves by use of directed, positional edge features and local node statistics (Li et al., 10 Dec 2024, Häusner et al., 12 Sep 2024). Enforced positive definiteness and sparsity ensure the resulting operator is stable and cheap to apply. Training is stochastic, often using matrix–vector products and Hutchinson estimators of Frobenius losses.

Key insight: Directly predicting triangular factors tends to allocate model capacity to diagonals already well-handled by IC; adding neural corrections (“IC+GNN delta”) enables precise refinement of off-diagonals, reducing PCG iteration counts by ≈25% vs. IC alone (Li et al., 10 Dec 2024).

3.2 Neural Incomplete Factorization

Approaches like NeuralIF (Häusner et al., 2023) construct $M = L L^T$ by GNN parameterized sparse $L$ . Message-passing steps are explicitly designed to reflect lower-triangular structure, with skip connections and two-direction passes mimicking $L$ and $L^T$ actions. Consistency, SPD structure, and absence of fill-in outside the pattern of $A$ are maintained. Models are compact (∼2k parameters), able to generalize to out-of-training distribution matrices, and shown to match IC(0) performance at reduced build-time.

3.3 Direct NN-based Cholesky (Compile/Online-Time)

A sparse two-layer linear network, with masked weights corresponding to the incomplete Cholesky pattern, learns the lower-triangular $L$ directly by regression on a set of randomly sampled matrix–vector products (Booth et al., 1 Mar 2024). The cost is amortized if multiple right-hand sides are solved; crucially, neural Cholesky always succeeds (never fails on indefinite pivots) and provides an SPD preconditioner in all tested cases.

4. Quantum and Specialized Factorized Preconditioning

The systematic integration of factorized preconditioning into non-classical settings demonstrates its architectural flexibility:

Quantum Variational Linear Solvers (VQLS): Preconditioning via incomplete LU ( $M = \tilde L \tilde U$ ) is incorporated into the variational quantum architecture by classically computing $M$ and using quantum circuits to minimize the preconditioned cost $C(\theta) = 1 - |⟨\tilde b|\tilde A|x(θ)⟩|^2/\ldots$ , yielding a dramatic reduction in the required quantum circuit depth (2–3×), argued as essential for the NISQ regime (Hosaka et al., 2023).
Kernel and Block Structures: Block-wise Schur complement/low-rank plus sparse factorized corrections (as in AFN for regularized kernel systems) (Zhao et al., 2023), admitting near-optimal Nyström approximations plus a sparse FSAI inverse correction, can be constructed in $O(n \log n)$ time with $O(1)$ iteration scaling.
Spectral-Element and PDE Systems: Sum-factorized and diagonalization-based preconditioners folding interior/face blocks into minimal, block-structured factorizations enable $O(N)$ runtime for high-order Helmholtz problems (Huismann et al., 2016).
Deep Learning Optimizers: Block Kronecker and two-level (coarse+fine) factorizations in Fisher-matrix preconditioning for natural gradient optimization (e.g., 2L-KFAC) efficiently capture global and local curvature (Tselepidis et al., 2020).

5. Implementation Principles and Performance

5.1 Workflow Summary

A generic workflow for factorized preconditioning in modern settings consists of:

Pattern Selection: Define sparsity or block structure for the factors (e.g., lower triangle of $A$ , block partitions, geometric levels).
Factor Computation:
- Classical: Compute incomplete LU/Cholesky or block-ILU via dropping/filling, with parallel QR/SVD when using explicit inverse/factorized sparse approximate inverse.
- Neural: Train GNN or neural operator with graph-structured features, optimized on data with (stochastic) matrix–vector loss, e.g., $\|L L^T x - A x\|^2$ or spectrum-control surrogates.
- Quantum: Classically precondition the matrix/vector and adapt the variational ansatz/state preparation to the preconditioned problem.
Deployment: Factors stored (often in CSR/COO format for compatibility with sparse libraries); preconditioning step in PCG/GMRES/other Krylov solvers is performed by sparse triangular solves or matrix–vector products.
Parameter Tuning: Sparsity levels, block sizes, and training mini-batch sizes are user-tunable, frequently with negligible increase in memory overhead over classical approaches.

5.2 Scaling and Robustness

Factorized preconditioners are typically $O(\mathrm{nnz}(A))$ in both setup and application cost when sparsity is controlled. Modern advances enable strong and weak scaling to thousands of processors/GPUs (e.g., aFSAI in (Isotton et al., 2020)), and are robust to matrix size, problem domain, and distributional shift (as in (Li et al., 10 Dec 2024, Häusner et al., 2023)). The capacity to handle indefinite or highly ill-conditioned systems depends on architectural choices, e.g., hierarchical block-Jacobi (PHIF) can ensure preconditioner stability for problems with contrast up to $10^4$ (Feliu-Fabà et al., 2018).

6. Applications, Trade-Offs, and Extensions

Factorized preconditioning architectures are now ubiquitous in classical and quantum scientific computing, PDE solvers, kernel methods, deep learning, and Bayesian inference. Their principal advantages include:

Efficiency and Parallelism: Embarrassingly parallel builds (columnwise in explicit factor approximations, block/row in aFSAI), ideal for GPU and distributed settings.
Adaptability: Amenable to hybrid neural/classical corrections, block/multilevel extensions, and problem-specific structural reuse.
Architectural Generality: The recipe extends to any setting where an approximate inverse, diagonalization, or spectrum transformation is beneficial—quantum VQLS, kernel preconditioning, deep curvature methods, variational inference via flow-based MCMC with factorized normalizing flows.

Trade-offs include the cost of one-time setup (often amortized), memory/storage tradeoffs in block size or pattern expansion, and the necessity of careful regularization and stabilization (e.g., positive definiteness, minimum diagonal entries) in data-driven settings.

Ongoing research investigates deeper neural architectures for learned sparsity, multilevel/factorized neural algebraic multigrid (NAMG), and high-dimensional probabilistic flow-MCMC preconditioning where splitting linear and highly nonlinear blocks—i.e., factorizing the preconditioner between domains—yields both superior exploration and training/data efficiency (Nabergoj et al., 4 Nov 2025).

In summary, a factorized preconditioning architecture leverages structured operator products to achieve rapid spectrum transformation and scalable application within iterative and optimization algorithms. The approach encompasses both classic algorithmic and data-driven paradigms, accelerating scientific computing across traditional and emerging domains.