BAGEL Model: Hierarchical Deflation in Lattice QCD

Updated 3 September 2025

BAGEL Model is a multi-level, hierarchically deflated conjugate gradient algorithm designed to efficiently solve five-dimensional chiral fermion systems in lattice QCD.
It employs blocked deflation and four-dimensional coarse grid strategies to drastically reduce matrix multiplies and wall-clock time in high-precision simulations.
The method integrates preconditioning techniques, including IRS and deflation preconditioners, to accelerate convergence while lowering computational and memory costs.

The BAGEL Model refers to a multi-level, hierarchically deflated conjugate gradient (HDCG) algorithm and its implementation within the BAGEL/Bfm software package for efficiently solving five-dimensional chiral fermion systems, particularly in lattice quantum chromodynamics (QCD). This approach targets the solution of the Dirac equation for formulations such as domain wall and Möbius Fermions by accelerating conjugate gradient solvers acting on red–black preconditioned normal equations. By employing hierarchical, blocked deflation and four-dimensional coarse grid strategies, BAGEL achieves substantial computational gains, drastically reducing both matrix multiplies and wall-clock time in large-scale physical-point simulations on high-performance hardware.

1. Mathematical Framework and Operator Formulation

The foundation of the BAGEL method is the Hermitian normal equations for the red–black preconditioned five-dimensional chiral fermion operator. Defining the preconditioned operator:

$\mathcal{H} = (M_{oo} - M_{oe} M_{ee}^{-1} M_{eo})^{\dagger} (M_{oo} - M_{oe} M_{ee}^{-1} M_{eo}) = M_{\text{prec}}^{\dagger} M_{\text{prec}}$

As the system is Hermitian, conjugate gradient (CG) techniques are applicable. However, a high condition number $\kappa$ —often $\kappa \sim 10^6$ due to low-mode dominance—can severely slow convergence:

$\sigma = \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}$

Thus, the strategy centers on multi-level deflation of low eigenmodes, drawing on approaches from algebraic multigrid theory.

2. Blocked Deflation and Coarse Grid Construction

To mitigate the impact of problematic low modes, BAGEL constructs a deflation subspace from “near-null” vectors $\phi_k$ via blocking. For a block $b$ , a blocked subspace vector is defined as:

$\phi_k^b(x) = \begin{cases} \phi_k(x), & x \in b \ 0, & x \notin b \ \end{cases}$

The projector onto this subspace $S$ is:

$P_S = \sum_{k,b} |\phi_k^b\rangle\langle\phi_k^b|$

“Locally coherent” low modes allow the blocked subspace to efficiently encapsulate slow-to-converge modes. A crucial advancement is that in BAGEL/Bfm, blocking and coarse grid construction are confined to four spacetime dimensions, not the fifth, yielding significant complexity and scaling advantages.

The coarse (“little Dirac”) operator on this subspace is:

$A^{ab}_{jk} = \langle \phi_j^a | M | \phi_k^b \rangle$

and the subspace inverse is $Q = M_{SS}^{-1} = A^{-1}$ . This coarse operator may be truncated spatially to nearest neighbors for efficiency.

3. Preconditioning Strategies and Acceleration of Conjugate Gradients

BAGEL’s multi-level acceleration combines two critical preconditioners:

Infra-Red Shift (IRS) Preconditioner ( $M_{\text{IRS}}$ ): Applies fixed (often single-precision) CG iterations to the shifted system, approximating $(\mathcal{H} + \lambda)^{-1}$ for an infrared regulator $\lambda$ . This smooths high modes.
Deflation Preconditioner (Little Dirac Operator $Q$ ): Implements blocked projection and inversion to target low modes.

A representative flexible preconditioned CG algorithm is:

r₀ = b − A x₀
z₀ = M_{IRS} r₀
p₀ = z₀

for k:
    αₖ = (rₖ, zₖ) / (pₖ, A pₖ)
    xₖ₊₁ = xₖ + αₖ pₖ
    rₖ₊₁ = rₖ − αₖ A pₖ
    zₖ₊₁ = M_{IRS} rₖ₊₁
    βₖ = (rₖ₊₁, zₖ₊₁) / (rₖ, zₖ)
    pₖ₊₁ = zₖ₊₁ + βₖ pₖ

Here $A$ incorporates the Schur complement and coarse grid inversion. Notably, iteration convergence is sensitive to the precision of $Q$ , but tolerant to less accurate $M_{IRS}$ solves.

4. Implementation in BAGEL/Bfm

Salient implementation features include:

Deflation Subspace Generation: Rational low-pass filters applied to Gaussian noise to preferentially populate low-eigenvalue content:

$\phi_k \propto \frac{1}{(\mathcal{H}+\lambda)(\mathcal{H}+\lambda+\epsilon)\ldots} \eta_k$

Subspace Orthogonalization: On-block orthogonalization expands the effective deflation space.
Coarse Operator Truncation: Restriction to nearest-neighbor couplings where justified, further improves cost.
Preconditioner Flexibility: Multiple strategies implemented (AD, DEF1, DEF2, A-DEF1, A-DEF2, BNN), with A-DEF2 found most robust.
Parameterization: User-tunable block sizes, vector numbers, inner/outer solver precisions, and fine/coarse level preconditioners (CG or Chebyshev-based).
Reduced-Precision Communication: Inner preconditioning steps can use reduced (sloppy) precision for bandwidth efficiency without performance loss.

5. Performance Gains, Scaling, and Comparison

On IBM BlueGene/Q at the physical point ( $48^3 \times 96 \times 24$ lattice):

Method	Time (sec)	Fine Matrix Multiplies	Notes
CG	1270	21,000+	Baseline double-precision
EigCG	320	—	Faster, high setup time (10h), large memory required
HDCG	90	1,600	14× faster than CG, ~3.5× faster than EigCG
HDCG (inexact)	~30	—	Lowered residual tolerance, even faster

A second hierarchy level for deflation reduces coarse grid operation cost by $\sim10\times$ .
Fully four-dimensional construction eliminates scaling with $L_s$ (fifth-dimension extent).
Fine-level matrix multiplies reduced 13-fold relative to regular CG.
Setup and memory efficiencies due to subspace reuse and hierarchical design.
Sloppy communication brings bandwidth savings without harming convergence.

6. Comparative Advantages and Trade-offs

Relative to prior approaches (EigCG, five-dimensional solvers without blocking, etc.), BAGEL’s hierarchically deflated strategy:

Lowers computational bottlenecks by orders of magnitude in wall time and matrix multiplies.
Achieves speed-up factors up to 14× over conventional CG, and 3.5× over optimized deflated CG.
Reduces setup and persistent memory footprint, critical for large-ensemble simulations.
Combines accurate low-mode deflation (critical to convergence) with cost-efficient high-mode smoothing.
Explicitly avoids fifth-dimensional penalty on the coarse grid, yielding superior scalability.
Tolerates relaxed inner solve criteria for IRS, enabling further resource efficiencies.

7. Applicability, Limitations, and Deployment

The BAGEL model is now integral within the BAGEL/Bfm software ecosystem for chiral Lattice QCD simulations. Key considerations when deploying or extending this method include:

The precision for coarse operator ( $Q$ ) inversions must be maintained, while high-mode preconditioners can be run at lower precision.
Block sizes, deflation subspace dimension, and inner/outer CG thresholds must be calibrated to lattice size, physical mass, and hardware specifics.
Communication strategies benefit from reduced precision but must be profiled for target machine characteristics.
Compared to approaches like EigCG, BAGEL provides more tractable setup, lower memory, and avoidance of storing large eigenvector sets.

This algorithmic structure and implementation, validated by numerical benchmarks at scale, have enabled practical, efficient simulations of large-volume, physical quark mass ensembles on massively parallel QCD systems, representing a major advance for high-precision lattice calculations (Boyle, 2014).

PDF Markdown Chat (Pro)

References (1)

Hierarchically deflated conjugate gradient (2014)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to BAGEL Model.