Co-evolutionary Calibration Framework
- The co-evolutionary calibration framework is a method that couples a genetic algorithm with a neural inverse map to rapidly calibrate the Heston stochastic volatility model.
- It achieves efficient parameter space exploration by injecting neural predictions into the GA, reducing RMSE by approximately 15% in synthetic tests.
- Hybrid data strategies, including GA-history and Latin hypercube sampling, are used to balance rapid adaptation with improved out-of-sample robustness.
A co-evolutionary calibration framework couples a genetic algorithm (GA) operating on model parameters with a neural network-based inverse map, which is trained to regress from option price surfaces to Heston model parameters. By allowing these components to co-adapt, the framework injects proposals from the learned neural inverse into the GA population, thereby accelerating exploration and exploitation of parameter space in the context of stochastic-volatility model calibration. The interplay between optimizer-driven samples and amortized inverse learning is central, with data-generation strategies profoundly affecting generalization and robustness (Gutierrez, 3 Dec 2025).
1. Heston Model and Calibration Objective
The co-evolutionary framework targets calibration of the Heston stochastic volatility model. Under the risk-neutral measure , the asset price and variance dynamics are governed by:
where the parameter vector is , subject to constraints such as the Feller condition and . European option prices are computed semi-analytically via Fourier inversion:
where and involve integrals over characteristic functions parameterized by . The calibration loss is typically mean squared error over observed market prices:
2. Genetic Algorithm Component
Within this framework, the GA maintains a population of candidate parameter vectors . Evolutionary steps proceed as follows:
- Population Initialization: individuals sampled uniformly in .
- Selection and Elitism: Elitism fraction ; elites are retained each generation.
- Crossover and Mutation: Arithmetic crossover () and Gaussian mutation (probability , per-parameter flip , perturbation scale proportional to parameter ranges) diversify the offspring.
- GA Pseudocode
initialize P_GA^{(0)} for g=0…G−1 do evaluate fitnesses F_GA select elites E create offspring via selection, crossover, mutation inject neural proposals (see Section 4) P_GA^{(g+1)} ← elites + offspring + injected end forFitness is the (negated) calibration loss, .
3. Neural Inverse Map Design
The neural inverse component consists of a population of neural networks mapping a flattened option price surface to predicted parameters . Architectural choices include:
- Input dimension equal to the surface grid size, output dimension $5$.
- Hidden layers ; widths .
- Activations: .
Training minimizes
with Adam optimizer (initial learning rate 0.001, exponential decay 0.9), batch size 64, and a 70/30 train/validation split.
Evolutionary operators are also employed:
- Weight crossover () and mutation ( chance per parameter, perturbation scale ).
- Architecture mutation: layer addition/removal, width and activation changes; survival fraction .
4. Co-evolutionary Training and Data Exchange
At each generation, the GA and neural populations interact bidirectionally, resulting in a dynamic data-generation and injection regime:
- GA NN: The top GA elites generate new neural training samples . These expand the NN’s dataset.
- NN Training/Evolution: Each NN retrains on the augmented set, with fitness metrics including dataset loss and direct calibration quality.
- NN GA: The top networks are selected. Their predictions on the target surface, plus Gaussian noise, yield , which replace the worst portion of the GA population:
- Population Updates: Elitism and evolutionary operators are applied to both GA and NN populations for the next generation.
5. Data Generation Strategies and Their Impact
Two contrasting dataset construction protocols shape generalization and overfitting characteristics:
- GA-History Sampling: Collects only pairs from GA elites over generations, resulting in “target-specific sampling” highly concentrated near .
- Latin Hypercube Sampling (LHS): Ensures space-filling coverage by partitioning each dimension into strata, sampling uniformly within them, and combining without replacement. This provides uniform, diverse parameter coverage.
The diversity and representativeness of datasets produced by these strategies determine overfitting and extrapolation properties:
| Strategy | In-sample Loss | Train–Validation Gap | Out-of-sample Stability |
|---|---|---|---|
| GA-History | Rapidly low | Widens with gen. | Poor (overfits target) |
| LHS | Higher | Smaller | Good (generalizes) |
6. Empirical Results in Synthetic and Real Calibration Tasks
Empirical findings demonstrate the quantitative effects of co-evolutionary dynamics and data strategy:
- Synthetic Targets: Co-evolutionary injection reduces RMSE faster than plain GA, achieving 15% lower error at ten generations ().
- Time-to-threshold (TTT) vs. LBFGS: Over 20 trials, median TTT to match LBFGS RMSE is (GA generations), indicating comparable calibration speed.
- Neural Architecture Drift: NN depth grows from layers, average nodes , and maximum nodes (generations ), showing a trend toward higher capacity.
- Learning Curves and Overfitting: Training MSE decreases steadily, but validation error plateaus and the gap widens as generations increase, confirming overfitting.
- Strategy Comparison: GA-history datasets achieve near-zero training loss, but validation error remains large, indicating overfitting. LHS datasets result in higher training loss, but validation loss is closer, supporting better generalization.
- Real SPX Calibration: On 152 quotes, the table below summarizes calibration loss and parameter errors over generations:
| Gen | Loss | κ% | λ% | σ% | ρ% | v₀% |
|---|---|---|---|---|---|---|
| 20 | 2.98e-4 | 400.6 | 42.6 | 17.8 | 27.9 | 25.7 |
| 40 | 2.07e-4 | 285.4 | 38.5 | 17.9 | 27.1 | 26.6 |
| 60 | 1.39e-4 | 153.7 | 34.7 | 21.7 | 25.3 | 16.8 |
| 80 | 1.13e-4 | 115.9 | 31.5 | 22.3 | 25.0 | 6.9 |
| 100 | 8.3e-5 | 58.2 | 27.5 | 22.5 | 24.7 | 6.2 |
GA-history–trained inverse models fit the target more tightly in-sample but this reflects target-dependent fitting, not a robust global inverse.
7. Practical Guidelines and Limitations
Analysis indicates:
- Specialization and Overfitting: Co-evolutionary specialization arises since GA elites repeatedly sample near , shrinking diversity and causing the inverse model to memorize rather than learn a functional global inverse. LHS preserves broad coverage, trading in-sample fit for out-of-sample robustness.
- Hybrid Data Regimens: A hybrid technique—combining initial LHS with periodic GA-history refinement—can balance rapid adaptation with preserved generalization. Maintaining a mixed buffer is recommended to avoid overfitting.
- Production Recommendations: Amortized inverse models should be trained on datasets spanning the full plausible parameter space; exclusive reliance on target-specific or optimizer-guided data will reduce robustness and out-of-sample stability.
- Algorithmic Tuning: Adjusting (the NN-proposal injection rate) and regularizing neural network capacity can mitigate domination or capacity-drift effects.
In summary, the co-evolutionary calibration framework harnesses neural inverse seeding to accelerate GA-based Heston calibration, but its self-reinforcing data loop risks overfitting without explicit dataset diversification. Latin hypercube sampling remains an effective, easily implemented countermeasure to ensure model generality across unseen implied-volatility surfaces (Gutierrez, 3 Dec 2025).