CFD-Driven Symbolic Closures

Updated 4 January 2026

CFD-driven symbolic closures are a novel approach that uses symbolic regression and evolutionary algorithms to generate physically consistent and computationally efficient closure models.
They employ surrogate-augmented, multi-objective optimization with Gaussian Process models to drastically reduce the number of expensive CFD evaluations while maintaining high accuracy.
The method produces fully interpretable algebraic expressions that preserve tensorial invariance, facilitating transfer between CFD solvers and diverse flow regimes.

CFD-driven symbolic closures are expressly designed to embed data-driven, highly interpretable closure models into Computational Fluid Dynamics (CFD) solvers, with the dual goals of maximizing physical consistency and predictive accuracy while minimizing model complexity and computational cost. These frameworks exploit symbolic regression and evolutionary algorithms to generate closure expressions, embed them into CFD simulations for fitness evaluation, and iteratively optimize their algebraic structure using in-loop simulation data, often within multi-objective and multi-model formulations. Recent advances have further accelerated the model search via surrogate modeling, allowing efficient exploration of candidate closures with substantial reductions in the number of costly CFD runs, all while retaining rigorous multi-objective optimization paradigms and symbolic model transparency (Fang et al., 22 Dec 2025).

1. Principles of Symbolic CFD-Driven Closure Discovery

Symbolic CFD-driven training reifies closure models as explicit algebraic formulas, often via symbolic regression methods such as Gene Expression Programming (GEP) or deterministic parameter selection. In a prototypical workflow, candidates $\{f_k\}$ are generated by symbolic regression and evaluated in a CFD solver—typically a Reynolds-averaged Navier-Stokes (RANS) framework—against high-fidelity data metrics (e.g., velocity profiles, temperature fields, scalar error norms). Each candidate closure, parameterized by invariants and tensor bases, is scored by a cost function $J$ . The evolutionary cycle operates iteratively: mutation and crossover act on the candidate pool, new closures are embedded and simulated, fitness is measured, and selection is performed, thereby converging over generations towards algebraically optimal closures for the target flow (Zhao et al., 2019, SaÏdi et al., 2021, Huijing et al., 2020).

These symbolic closures adhere to tensorial invariance principles, ensuring rotational and Galilean equivariance, and exploit minimal, interpretable bases (e.g., $\{S_{ij}$ , $S_{ik}\Omega_{kj} - \Omega_{ik}S_{kj}$ , etc.), with scaling coefficients expressed as simple polynomials or rational functions of local invariants (e.g., $I_1 = \text{Tr}(S^2)$ , $I_2 = \text{Tr}(\Omega^2)$ ).

2. Surrogate-Augmented Symbolic CFD-Driven Training

The computational cost of classical symbolic CFD-driven training is dominated by repeated high-fidelity CFD solves (hundreds–thousands per generation). The surrogate-augmented framework mitigates this by constructing a real-time probabilistic surrogate—specifically a multi-output Gaussian Process (GP)—to map each symbolic model (after aggregation to a continuous descriptor $x\in\mathbb{R}^d$ ) to predictive error metrics $y\in\mathbb{R}^p$ . The surrogate is continuously updated via new CFD evaluations and used to select only those models predicted to yield low errors or high predictive uncertainty for full CFD evaluation, with the remainder scored by GP predictions. Multi-objective selection metrics (e.g., lower confidence bounds, expected improvement) drive the candidate selection for expensive simulation calls, and the surrogate is generalized for simultaneous prediction of multiple objective metrics (e.g., velocity and temperature errors) by extending the GP kernel to matrix form (Fang et al., 22 Dec 2025).

The essential workflow:

Initialize candidate population P₀ via symbolic regression
Evaluate all f ∈ P₀ with full CFD → get X₀, Y₀
for gen = 1 to maxGen:
    Train GP surrogate on cumulative CFD data
    Generate new population via GEP
    Map symbolic models to descriptors x_f
    Predict μ(x_f), σ(x_f) with GP
    Compute selection score s(f) (e.g. LCB, EI)
    Select top m₁ for full CFD
    Evaluate selected models with CFD → augment training set
    Assign surrogate value to remaining models
    Feedback updated fitnesses to symbolic regression engine
Return Pareto-optimal or aggregate best closures

Empirical benchmarks across single- and multi-objective cases reveal substantial reductions in CFD evaluations (up to 81%), with final trained closures matching the predictive accuracy of the full CFD-driven approach (Fang et al., 22 Dec 2025).

3. Mapping Symbolic Expressions to Surrogate Inputs

Discrete symbolic expressions lack native smoothness for surrogate modeling. The proposed framework aggregates closure evaluations over a selected subset of flow invariants (e.g., strain-rate invariants, temperature gradients), compressing these to a $d$ -dimensional descriptor (by averaging or other statistical aggregation) suitable for interpolative surrogate fitting. For example, for closure $f$ , the descriptor is $x = h(\{f(I_j, J_k)\}) = \tfrac{1}{M}\sum_{j,k}f(I_j, J_k)$ (Fang et al., 22 Dec 2025). This approach enables continuous surrogate predictions for discrete symbolic candidates and is immediately extensible to multi-dimensional invariant spaces.

4. Multi-Objective Optimization and Selection Metrics

Modern symbolic CFD-driven training incorporates multi-objective optimization both within the closure search and the surrogate selection process. Multiple error metrics (e.g., $J_u$ for velocity, $J_T$ for temperature) are predicted by the surrogate, and Pareto ranking or scalarization (e.g., aggregated cost, weighted norms) guide selection. The GP surrogate outputs mean and variance per objective, which are combined into selection scores for model evaluation:

Lower-confidence-bound: $m_\mathrm{LCB}(x) = \sum_{i=1}^p [-\mu_i(x)+\beta \sigma_i(x)]$
Expected improvement: $m_\mathrm{EI}(x) = \sum_{i=1}^p \mathrm{EI}_i(x)$

Models with the lowest LCB or highest EI are passed for full CFD validation; others are solely scored by the surrogate until iteration thresholds are met. This enables full Pareto fronts without requiring fixed weighting of objectives, supporting genuinely multi-objective discovery for coupled closures (e.g., turbulent stress and heat-flux optimizations) (Waschkowski et al., 2021, Fang et al., 22 Dec 2025).

5. Representative Results and Computational Efficiency

Four benchmark flow scenarios validate the surrogate-augmented approach: square duct turbulence (2D), vertical natural convection (1D turbulence and heat-flux), horizontal mixed convection (1D coupled), and concentric-annulus natural convection (quasi-2D coupled). Key findings include:

Case	Baseline CFD Calls	Surrogate CFD Calls	Cost Saving	Final Error (DNS)
Square Duct	N/A	~50% fewer	50%	DNS-level
Vert. Nat. Conv.	2000	880	56%	9.5% (Nusselt)
Horiz. Mixed Conv	1900	355	81%	DNS-level
Concentric Annul.	2350	1257	47%	DNS-level

The surrogate-augmented symbolic closure framework achieves comparable or improved predictive accuracy compared to the original CFD-driven method, with acceleration factors reaching 80% in CFD cost reduction, and without loss of DNS-level fidelity in predicted quantities of interest (Fang et al., 22 Dec 2025).

6. Key Advances and Distinctions

Surrogate-augmented symbolic CFD-driven closure frameworks differ markedly from classical data-algebraic or black-box regression approaches:

Leverages all historical CFD evaluations (not just Pareto-optimal candidates), exploiting both high and low-performing models to build a nuanced probabilistic error landscape.
Reserves expensive CFD solves for candidates with maximal promise or highest uncertainty, optimizing resource use without compromising search space diversity.
Integrates multi-output Gaussian Process surrogates to track multiple objectives, supporting Pareto-based selection for complex coupled model systems (Fang et al., 22 Dec 2025, Waschkowski et al., 2021).
Produces fully interpretable, algebraic closure expressions that can be examined, analyzed, and transferred between different solvers and flow regimes.

Notably, this paradigm maintains the full physical consistency characteristic of CFD-driven symbolic methods, ensures extensibility to multi-expression and multi-objective cases, and facilitates robust, rapid closure discovery in high-dimensional, costly simulation environments.

7. Research Directions, Limitations, and Outlook

Practical deployments and limitations center on surrogate fidelity and representation of candidate closure models. Surrogate performance depends on robust mapping from symbolic trees to continuous descriptors—a challenge for highly discontinuous or exotic expression forms. The approach is presently validated across statistically 1D/2D flows, with direct extension to tensorial, multi-parameter, and high-Reynolds-number systems anticipated (Fang et al., 22 Dec 2025). Opportunities include surrogate integration with adjoint-based sensitivity analysis, adaptive design-of-experiments, and fusion with SINDy-type frameworks for model parsimonity and parameter awareness (Kang et al., 13 Aug 2025).

The rapid acceleration, multi-objective flexibility, and interpretability of surrogate-augmented symbolic CFD-driven closure training define its position as an advanced methodological cornerstone for next-generation turbulence modeling, closure discovery, and multi-physics CFD integration.