Co-evolutionary Calibration Framework

Updated 5 December 2025

The co-evolutionary calibration framework is a method that couples a genetic algorithm with a neural inverse map to rapidly calibrate the Heston stochastic volatility model.
It achieves efficient parameter space exploration by injecting neural predictions into the GA, reducing RMSE by approximately 15% in synthetic tests.
Hybrid data strategies, including GA-history and Latin hypercube sampling, are used to balance rapid adaptation with improved out-of-sample robustness.

A co-evolutionary calibration framework couples a genetic algorithm (GA) operating on model parameters with a neural network-based inverse map, which is trained to regress from option price surfaces to Heston model parameters. By allowing these components to co-adapt, the framework injects proposals from the learned neural inverse into the GA population, thereby accelerating exploration and exploitation of parameter space in the context of stochastic-volatility model calibration. The interplay between optimizer-driven samples and amortized inverse learning is central, with data-generation strategies profoundly affecting generalization and robustness (Gutierrez, 3 Dec 2025).

1. Heston Model and Calibration Objective

The co-evolutionary framework targets calibration of the Heston stochastic volatility model. Under the risk-neutral measure $\mathbb Q$ , the asset price $S_t$ and variance $v_t$ dynamics are governed by:

$\begin{aligned} dS_t &= r\,S_t\,dt + \sqrt{v_t}\,S_t\,dW_{1,t}, \ dv_t &= \kappa(\lambda - v_t)\,dt + \sigma\,\sqrt{v_t}\,dW_{2,t},\qquad d\langle W_1,W_2\rangle_t = \rho\,dt, \end{aligned}$

where the parameter vector is $\theta_H = (\kappa, \lambda, \sigma, \rho, v_0)$ , subject to constraints such as the Feller condition $2\kappa\lambda > \sigma^2$ and $v_0 > 0$ . European option prices are computed semi-analytically via Fourier inversion:

$C(\theta) = S_0\,\Pi_1 - K e^{-r\tau}\,\Pi_2,$

where $\Pi_1$ and $\Pi_2$ involve integrals over characteristic functions parameterized by $(\kappa, \lambda, \sigma, \rho, v_0)$ . The calibration loss is typically mean squared error over observed market prices:

$\mathcal{L}_{\rm price}(\theta_H) = \frac{1}{M} \sum_{m=1}^M \left[C(\theta_H;S_0,r,\tau_m,K_m) - C^{\rm mkt}_m\right]^2.$

2. Genetic Algorithm Component

Within this framework, the GA maintains a population of candidate parameter vectors $\theta \in \Theta_H$ . Evolutionary steps proceed as follows:

Population Initialization: $N=50$ individuals sampled uniformly in $\Theta$ .
Selection and Elitism: Elitism fraction $\varepsilon_{\rm GA} = 0.2$ ; elites are retained each generation.
Crossover and Mutation: Arithmetic crossover ( $p_{x,{\rm GA}}=0.3$ ) and Gaussian mutation (probability $p_{m,{\rm GA}}=0.2$ , per-parameter flip $\mu_{\rm GA}=0.1$ , perturbation scale $\sigma_{\rm mut}$ proportional to parameter ranges) diversify the offspring.
GA Pseudocode initialize P_GA^{(0)} for g=0…G−1 do evaluate fitnesses F_GA select elites E create offspring via selection, crossover, mutation inject neural proposals (see Section 4) P_GA^{(g+1)} ← elites + offspring + injected end for Fitness is the (negated) calibration loss, $F_{\text{GA}}(\theta) = -\mathcal{L}_{\text{price}}(\theta)$ .

3. Neural Inverse Map Design

The neural inverse component consists of a population of neural networks $\mathrm{NN}_i(s; W_i, A_i)$ mapping a flattened option price surface $s$ to predicted parameters $\hat{\theta}\in\mathbb{R}^5$ . Architectural choices include:

Input dimension equal to the surface grid size, output dimension $5$.
Hidden layers $L\in\{1,2,3\}$ ; widths $h_\ell\in\{16,32,64,128,256,512\}$ .
Activations: $\{\mathrm{ReLU}, \mathrm{Tanh}, \mathrm{LeakyReLU}, \mathrm{ELU}\}$ .

Training minimizes

$L_{NN}(W, A; \mathcal{D}) = \frac{1}{|\mathcal{D}|} \sum_{(s, \theta)\in\mathcal{D}} \|\mathrm{NN}(s;W,A)-\theta\|_2^2,$

with Adam optimizer (initial learning rate 0.001, exponential decay 0.9), batch size 64, and a 70/30 train/validation split.

Evolutionary operators are also employed:

Weight crossover ( $W_{\rm child} = \frac12 (W_{p1} + W_{p2})$ ) and mutation ( $\mu_w = 0.1$ chance per parameter, perturbation scale $\sigma_w = 0.02$ ).
Architecture mutation: layer addition/removal, width and activation changes; survival fraction $\varepsilon_{\rm NN}=0.2$ .

4. Co-evolutionary Training and Data Exchange

At each generation, the GA and neural populations interact bidirectionally, resulting in a dynamic data-generation and injection regime:

GA $\rightarrow$ NN: The top $\varepsilon_{\rm GA} N$ GA elites generate new neural training samples $(\text{surface}, \theta)$ . These expand the NN’s dataset.
NN Training/Evolution: Each NN retrains on the augmented set, with fitness metrics including dataset loss and direct calibration quality.
NN $\rightarrow$ GA: The top $\varepsilon_{\rm inj} N$ networks are selected. Their predictions on the target surface, plus Gaussian noise, yield $\theta_{\rm inject}$ , which replace the worst portion of the GA population:

$\hat{\theta}_i = \mathrm{NN}_i(\mathrm{flat}(S_{\rm target})),\quad \theta_{\rm inject} = \hat{\theta}_i + \zeta,\,\,\zeta\sim\mathcal{N}(0, \sigma_{\rm inj}^2 I)$

Population Updates: Elitism and evolutionary operators are applied to both GA and NN populations for the next generation.

5. Data Generation Strategies and Their Impact

Two contrasting dataset construction protocols shape generalization and overfitting characteristics:

GA-History Sampling: Collects only $(s_j, \theta_j)$ pairs from GA elites over generations, resulting in “target-specific sampling” highly concentrated near $\theta^\star$ .
Latin Hypercube Sampling (LHS): Ensures space-filling coverage by partitioning each dimension into strata, sampling uniformly within them, and combining without replacement. This provides uniform, diverse parameter coverage.

The diversity and representativeness of datasets produced by these strategies determine overfitting and extrapolation properties:

Strategy	In-sample Loss	Train–Validation Gap	Out-of-sample Stability
GA-History	Rapidly low	Widens with gen.	Poor (overfits target)
LHS	Higher	Smaller	Good (generalizes)

6. Empirical Results in Synthetic and Real Calibration Tasks

Empirical findings demonstrate the quantitative effects of co-evolutionary dynamics and data strategy:

Synthetic Targets: Co-evolutionary injection reduces RMSE faster than plain GA, achieving $\sim$ 15% lower error at ten generations ( $G=10$ ).
Time-to-threshold (TTT) vs. LBFGS: Over 20 trials, median TTT to match LBFGS RMSE is $\approx 26.4$ (GA generations), indicating comparable calibration speed.
Neural Architecture Drift: NN depth grows from $1.95 \rightarrow 2.15$ layers, average nodes $192 \rightarrow 208$ , and maximum nodes $256 \rightarrow 384$ (generations $20 \rightarrow 100$ ), showing a trend toward higher capacity.
Learning Curves and Overfitting: Training MSE decreases steadily, but validation error plateaus and the gap widens as generations increase, confirming overfitting.
Strategy Comparison: GA-history datasets achieve near-zero training loss, but validation error remains large, indicating overfitting. LHS datasets result in higher training loss, but validation loss is closer, supporting better generalization.
Real SPX Calibration: On 152 quotes, the table below summarizes calibration loss and parameter errors over generations:

Gen	Loss	κ%	λ%	σ%	ρ%	v₀%
20	2.98e-4	400.6	42.6	17.8	27.9	25.7
40	2.07e-4	285.4	38.5	17.9	27.1	26.6
60	1.39e-4	153.7	34.7	21.7	25.3	16.8
80	1.13e-4	115.9	31.5	22.3	25.0	6.9
100	8.3e-5	58.2	27.5	22.5	24.7	6.2

GA-history–trained inverse models fit the target more tightly in-sample but this reflects target-dependent fitting, not a robust global inverse.

7. Practical Guidelines and Limitations

Analysis indicates:

Specialization and Overfitting: Co-evolutionary specialization arises since GA elites repeatedly sample near $\theta^\star$ , shrinking diversity and causing the inverse model to memorize rather than learn a functional global inverse. LHS preserves broad coverage, trading in-sample fit for out-of-sample robustness.
Hybrid Data Regimens: A hybrid technique—combining initial LHS with periodic GA-history refinement—can balance rapid adaptation with preserved generalization. Maintaining a mixed buffer is recommended to avoid overfitting.
Production Recommendations: Amortized inverse models should be trained on datasets spanning the full plausible parameter space; exclusive reliance on target-specific or optimizer-guided data will reduce robustness and out-of-sample stability.
Algorithmic Tuning: Adjusting $\varepsilon_{\rm inj}$ (the NN-proposal injection rate) and regularizing neural network capacity can mitigate domination or capacity-drift effects.

In summary, the co-evolutionary calibration framework harnesses neural inverse seeding to accelerate GA-based Heston calibration, but its self-reinforcing data loop risks overfitting without explicit dataset diversification. Latin hypercube sampling remains an effective, easily implemented countermeasure to ensure model generality across unseen implied-volatility surfaces (Gutierrez, 3 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

A Co-evolutionary Approach for Heston Calibration (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Co-evolutionary Calibration Framework.