2000 character limit reached

CG-FFT: Conjugate Gradient with FFT Acceleration

Updated 18 November 2025

CG-FFT is a numerical method that combines the conjugate gradient solver with FFT acceleration to tackle large-scale, structured linear systems arising from PDEs, homogenization, and inverse problems.
FFT acceleration enables rapid evaluation of convolution operators and efficient preconditioning, such as Green’s and Green-Jacobi methods, with O(N log N) complexity.
CG-FFT has proven benefits in computational mechanics, lattice gauge theory, and quantum physics, significantly reducing iteration counts and wall-time compared to traditional solvers.

Conjugate Gradient with Fast Fourier Transform (CG-FFT) denotes a broadly applicable numerical strategy wherein the conjugate gradient (CG) algorithm is combined with fast Fourier transform (FFT)-based acceleration for the solution of large-scale linear systems, typically arising from discretizations of elliptic PDEs, integral equations, or inverse problems on regular (periodic or block-circulant) grids. FFT acceleration is used either for the efficient application of structured operators (e.g., convolutions, block-Toeplitz/Hankel matrices), for preconditioning (via Green’s operators or reference-material stencils), or both. This paradigm underlies efficient solvers in computational mechanics, materials science, lattice gauge theory, inverse problems, and quantum physics.

1. Mathematical Framework and Discretization

The CG-FFT approach is most naturally formulated for linear systems $A u = b$ where $A$ represents a structured operator derived from PDEs or integral equations discretized on a regular grid. Classical contexts include:

Multiscale modeling of linear elasticity on periodic cells, where $A = K = B^T W C B$ encodes the discrete linear equilibrium, $u$ is the nodal displacement fluctuation, and $C(x)$ is a (possibly high-contrast) tensor evaluated at quadrature points (Ladecký et al., 4 Aug 2025).
Homogenization of heterogeneous materials via trigonometric collocation or Galerkin schemes, leading to Lippmann–Schwinger systems of the form $(I + B) e = e^0$ , with $B$ a discrete convolution, efficiently applied with FFT (Zeman et al., 2010, Vondřejc et al., 2011, Mishra et al., 2015).
Inverse problems governed by time-invariant PDEs, with block Toeplitz Hessians enabling convolutional FFT-based operator application (Venkat et al., 18 Jul 2024).
Gauge fixing in lattice field theory, where the gradient of the gauge-fixing functional and its Fourier-accelerated version are key in updating the gauge fields (Hudspith, 2014, Hudspith, 2014).

The discretization typically leverages regular grids (for block-circulant structure) or spectral spaces (Fourier, Chebyshev). For variational or weak formulations (e.g., finite element or Fourier-Galerkin), operators such as the gradient, divergence, or elasticity tensor are assembled in sparse or block-diagonal forms. FFT acceleration exploits circulant or block-circulant operator structure, with FFTs diagonalizing constant-coefficient reference problems.

2. FFT-Based Operator and Preconditioner Application

FFT serves two principal roles: rapid evaluation of operator action (matrix-vector products), and efficient preconditioning via structured reference operators.

Operator application: In the case of convolution, the action $y = B u$ is evaluated as $y = F^{-1} ( \hat{\Gamma}^0 \cdot (F (D u)) )$ , where $F$ , $F^{-1}$ are FFT and inverse FFT, and $\hat{\Gamma}^0$ is a diagonal multiplier in Fourier space (Zeman et al., 2010, Vondřejc et al., 2011).
Green’s operator preconditioning: When a constant-coefficient reference operator $K^0$ is block-circulant, its inverse (the discrete Green operator $G$ ) is diagonal in Fourier space. Application requires forward FFT, diagonal scaling, and inverse FFT, preserving overall $O(N \log N)$ cost (Ladecký et al., 4 Aug 2025).
Composite preconditioners: Enhanced preconditioners, such as the Green-Jacobi preconditioner $\Gamma_J = D^{-1/2} G D^{-1/2}$ , exploit both local (Jacobi, $D$ ) and global (Green, $G$ ) structure, further improving conditioning without compromising per-iteration complexity (Ladecký et al., 4 Aug 2025).

In block-Toeplitz/Hankel contexts (e.g., inverse problems), Hessian application reduces to a circulant-embedded convolution, so each Hessian-vector multiplication is performed as one forward FFT, pointwise complex multiplication, and one inverse FFT per spatial block (Venkat et al., 18 Jul 2024).

3. Conjugate Gradient Algorithm with FFT Acceleration

The CG algorithm solves symmetric positive-definite (or projected SPD) systems by constructing optimal search directions and performing matrix-vector products accelerated by FFT-based routines. At each iteration, the dominant costs are:

Single or double FFT calls for each operator or preconditioner application.
Sparse or block-diagonal real-space matrix-vector products, scaling as $O(N)$ .
Vector dot products and updates (saxpy), $O(N)$ .

A typical preconditioned CG iteration for $K u = f$ with preconditioner $M^{-1}$ (possibly Green or Green-Jacobi) involves:

r = f - K @ u
z = M_inv(r)        # e.g., Green or Green-Jacobi via FFTs and scalings
p = z
for k in range(maxiter):
    Ap = K @ p             # sparse mat-vec, O(N)
    alpha = dot(r, z) / dot(p, Ap)
    u = u + alpha * p
    r_new = r - alpha * Ap
    z_new = M_inv(r_new)   # FFT-based precond: O(N log N)
    beta = dot(r_new, z_new) / dot(r, z)
    p = z_new + beta * p
    r, z = r_new, z_new
    if norm(r) < tol:
        break

(Ladecký et al., 4 Aug 2025, Zeman et al., 2010, Venkat et al., 18 Jul 2024)

Each preconditioner apply commonly involves two FFTs (forward/backward) and $O(N)$ diagonal operations. This architecture leads to quasilinear, $O(N \log N)$ , per-iteration cost.

4. Convergence, Spectral Conditioning, and Performance

The effectiveness of CG-FFT is dictated by the conditioning of the preconditioned system. Key observations:

Standard Green preconditioning: The condition number $\kappa(G K)$ is independent of mesh size $N$ but grows rapidly with phase contrast and smoothness of $C(x)$ (Ladecký et al., 4 Aug 2025, Zeman et al., 2010).
Jacobi (diagonal) preconditioning: Strongly mesh-size dependent, with $\kappa$ growing as $N$ , but less dependent on phase contrast.
Green-Jacobi (composite) preconditioning: Offers a balance, yielding a condition number only mildly dependent on mesh size and decreasing with local smoothness (neighboring contrast), drastically reducing the required CG iterations for smoothly varying, high-contrast coefficients (Ladecký et al., 4 Aug 2025).

Iteration counts and wall-time speedups in representative scenarios are as follows:

Problem Type	Green PCG Iter.	J-FFT PCG Iter.
Sharp inclusion, contrast $10^4$	75	38
Smooth contrast, filtered $10^4$	320	42
Phase-field topology optimization	3000	120

These reductions (up to 25x fewer CG iterations for smooth high-contrast data) are observed with the Green-Jacobi preconditioner (Ladecký et al., 4 Aug 2025). For lattice gauge-fixing, FACG typically yields a 2–4× wall-time improvement over Fourier-accelerated steepest descent (Hudspith, 2014, Hudspith, 2014). In block-Toeplitz inverse problems, CG-FFT-based Hessian actions furnish 3–4 orders of magnitude acceleration compared to conventional matrix-free PDE Hessian-vector evaluations (Venkat et al., 18 Jul 2024).

5. Application Domains and Practical Algorithms

CG-FFT and its variants have broad applicability:

Multiscale materials and periodic homogenization: Elastic or conductivity problems on regular periodic grids, with or without high contrast and smooth microstructure (Ladecký et al., 4 Aug 2025, Zeman et al., 2010, Mishra et al., 2015, Vondřejc et al., 2011).
Phase-field fracture, topology optimization, and grid-adaptive schemes: Featuring smoothly varying, high-contrast material data, where improved spectral preconditioning is critical (Ladecký et al., 4 Aug 2025).
Lattice gauge theory: Landau and Coulomb gauge fixing via non-linear CG with Fourier-accelerated gradients (Hudspith, 2014, Hudspith, 2014).
Inverse problems and PDE-constrained optimization: Newton-Krylov methods for time-invariant systems, exploiting block Toeplitz Hessians and FFT-accelerated Hessian actions (Venkat et al., 18 Jul 2024).
Quantum and wave equations: Stationary Schrödinger and other dispersive equations with FFT-based preconditioned CG leveraging operator splitting (Serov et al., 2015).

In all these domains, implementation leverages optimized FFT libraries (e.g., FFTW, cuFFT for GPUs), careful data layout for memory throughput, batching for multi-variable or block-structured operators, and, in large-scale settings, distributed FFTs and collective MPI communication.

6. Algorithmic Extensions and Theoretical Considerations

Further innovations include:

Matrix-free finite element (FE)-FFT hybridization: Periodic FE discretization with FFT-based block-circulant Green's operator preconditioners, achieving mesh-independent iteration counts and avoiding spectral ringing inherent to full Fourier solvers (Ladecký et al., 2022).
Advanced block-diagonal or local spectral preconditioners: For non-constant operators (e.g., in ECS for Schrödinger equations), preconditioners that adaptively capture spatially varying spectral properties are constructed, often through blocked eigenmode decompositions (Serov et al., 2015).
Projector restrictions for non-symmetric or singular systems: Use of appropriate subspace projectors ensures that the CG recurrences access only symmetric positive definite components, guaranteeing convergence even for systems that are not globally SPD (Vondřejc et al., 2011).

Spectral analysis underpins preconditioner design and convergence bounds. For Dyadic inverses or block-diagonalizable circulants, FFTs diagonalize operators, rendering both application and inversion efficient. Theoretical error reductions are dictated by spectral radius or condition number, with CG achieving an optimal $\rho_{\rm CG} \sim (\sqrt{\kappa} - 1)/(\sqrt{\kappa} + 1)$ per iteration decay versus $O(\kappa)$ for Richardson-like schemes (Zeman et al., 2010, Mishra et al., 2015, Vondřejc et al., 2011).

7. Implementation Metrics and Benchmarks

CG-FFT methods universally demonstrate $O(N \log N)$ per-iteration cost and $O(N)$ storage requirements, outperforming methods with higher complexity (e.g., $O(N^2)$ or $O(N^3)$ ) for large-scale systems.

Selected benchmark data:

Volume	FASD (s)	FACG (s)
$16^3 \times 32$	25	10.5
$24^3 \times 64$	306	122
$48^3 \times 96$	13,000	6,300

(taken for Landau gauge fixing; similar speedups in other domains) (Hudspith, 2014, Hudspith, 2014)

For Hessian actions in time-invariant inverse problems, multi-GPU implementations of CG-FFT achieve more than 80% of A100 GPU peak bandwidth and excellent scaling up to 48 GPUs, with per-matvec times of $<$ 0.1 s for problems of size $N_t \sim 2 \cdot 10^3$ and $(N_m,N_d) \sim 10^4$ – $10^5$ (Venkat et al., 18 Jul 2024).

References

Jacobi-accelerated FFT-based solver for smooth high-contrast data (Ladecký et al., 4 Aug 2025)
Accelerating a FFT-based solver for numerical homogenization of periodic media by conjugate gradients (Zeman et al., 2010)
Analysis of a Fast Fourier Transform Based Method for Modeling of Heterogeneous Materials (Vondřejc et al., 2011)
A comparative paper on low-memory iterative solvers for FFT-based homogenization of periodic media (Mishra et al., 2015)
Fourier Accelerated Conjugate Gradient Lattice Gauge Fixing (Hudspith, 2014)
Conjugate Directions in Lattice Landau and Coulomb Gauge Fixing (Hudspith, 2014)
Fast and Scalable FFT-Based GPU-Accelerated Algorithms for Hessian Actions Arising in Linear Inverse Problems Governed by Autonomous Dynamical Systems (Venkat et al., 18 Jul 2024)
Solution of the Schrödinger equation using exterior complex scaling and fast Fourier transform (Serov et al., 2015)
Optimal FFT-accelerated Finite Element Solver for Homogenization (Ladecký et al., 2022)
Conjugate gradient acceleration of iteratively re-weighted least squares methods (Fornasier et al., 2015)