- The paper proposes a conditional DDPM that generates arbitrage-free IV surfaces by integrating SNR-weighted arbitrage penalties into its training objective.
- It employs a U-Net based reverse diffusion process with FiLM and sinusoidal embeddings to effectively condition on market data and temporal features.
- Empirical results show lower MAPE, enhanced calibration, and smoother surface generation compared to traditional GAN-based methods.
Conditional Generative Diffusion Models for Arbitrage-Free Implied Volatility Surface Forecasting
Overview and Motivation
The paper proposes a conditional Denoising Diffusion Probabilistic Model (DDPM) designed for generating and forecasting arbitrage-free implied volatility (IV) surfaces. This framework addresses the inherent difficulties in modeling the IV surface manifold, which is constrained by stringent no-arbitrage conditions such as convexity in strike and monotonicity in tenor. The manifold is non-linear and low-dimensional, embedded inside a higher-dimensional space, requiring any generative model to restrict outputs to the arbitrage-free domain. Previous work relying on parametric models (e.g., Heston), GANs, or VAEs has been limited by mode collapse, unreliable uncertainty calibration, or difficulties in respecting financial constraints.
Model Architecture and Process
Data Representation and Manifold Constraints
Implied volatility surfaces are constructed for a fixed grid of moneyness and tenor (9×9 grid), using market data subject to intensive cleaning and smoothing (vega-weighted kernel regression). Explicit no-arbitrage penalties—quantifying the extent of violations of butterfly, calendar spread, and call spread constraints—are employed both as diagnostic and as training objectives.
Forward and Reverse Diffusion Processes
The DDPM's forward process perturbs clean surface data using a Markovian chain via a pre-defined variance schedule, approaching a standard multivariate Gaussian distribution in the limit. For the volatility surface, a Variance Preserving SDE is adopted, appropriate to the surface's normalized, bounded statistics.
Figure 1: Diffusion process of the implied volatility.
The reverse process is parameterized by a conditional U-Net, approximating the score function and utilizing both the noisy input, diffusion timestep (positional encoding via sinusoids), and a vector of conditioning variables (prior day surfaces, EWMAs of returns, VIX return). Feature-wise linear modulation (FiLM) injects the temporal and market features at multiple network levels, yielding the necessary context-awareness.
Weighted Arbitrage Penalty and SNR Scheduling
A novel feature is the direct use of the arbitrage penalty in the objective function, dynamically modulated by the estimated signal-to-noise ratio (SNR) at each diffusion step. This addresses the instability in surface estimation at high noise levels and ensures that, as the reverse process approaches the clean data manifold, arbitrage violations are minimized. This weighting provides a parameter-free adaptation, allowing the penalty to be strong where most relevant but ineffective where the denoised estimate is unreliable.
Figure 2: Arbitrage level of processed dataset from 1996-2023.
Theoretical Guarantees
Under regularity and convexity assumptions on the data distribution and the arbitrage penalty, a convergence proof is established. The result bounds the total variation distance between the generated and empirical distributions, incorporating bias (O(λ2)) from the arbitrage penalty and showing that, with sufficiently small λ, the model's output remains close to the data while being steered toward the arbitrage-free surface manifold.
Implementation Details
U-Net Conditioning and Sampling
The core model is a U-Net, receiving a 4×9×9 spatial tensor (current, short-term EWMA, long-term EWMA, and noisy surface) plus a standardized 5D market-vector (EWMA returns/volatility and VIX change). Temporal information is encoded via sinusoidal embedding. The architecture employs FiLM for context modulation, and training revolves around MSE in noise prediction combined with the weighted arbitrage penalty.
Training Stability and Deployment
Training utilizes AdamW, gradient clipping, EMA snapshots, and early stopping. The SNR-weighted arbitrage term circumvents the destabilization commonly seen in explicit regularization schemes. At inference, the EMA model is used to generate stochastic samples for the specified market context, producing both mean forecasts and calibrated confidence intervals.
Empirical Results
Accuracy, Plausibility, and Calibration
The DDPM outperforms the VolGAN benchmark across metrics: lower overall MAPE (3.0% vs. 3.7%), better calibration (confidence interval breach rates close to theoretical 10%), and visually smoother, artifact-free surfaces. GAN-based models often produce creases and artifacts around the boundaries of the surface grid, especially for short-tenor, low-moneyness points, failing to encode hard manifold constraints.
Arbitrage Analysis
Generated surfaces display arbitrage penalties closely tracking those found in the test data, attesting to the effectiveness of the loss modulation. This is notable given historical data frequently contains violation outliers. Despite the imperfect training set, the model generally outputs surfaces with plausible financial features.
Distributional Fidelity
Higher-order moment analysis and pooled sample histograms indicate substantial matching of skew and fatter tails; however, the diffusion model exhibits slightly higher kurtosis, suggesting a conservative bias towards tail risks. While a benefit for stress testing, further refinement of OTM confidence intervals is warranted to avoid over-conservatism.
Trade-offs and Guidance
The model's main trade-off is in the choice and tuning of the arbitrage penalty weight λ: large λ enforces plausibility but risks introducing bias away from the empirical distribution. The SNR-modulated schedule mitigates this, but care is still needed when extending the framework to markets where arbitrage frequency or sources differ structurally.
Comparatively, DDPMs offer considerably more stable training and more robust uncertainty quantification compared to GANs in the IV forecasting context. Direct injection of financial constraints and market context into the architecture—leveraging domain-specific diagnostics (finite-difference penalties, market EWMAs, VIX)—generates outputs well aligned with practitioner requirements and regulatory concerns.
Implications and Future Directions
The presented methodology sets precedent for the integration of domain-specific hard or soft constraints into modern generative models, particularly for financial timeseries and manifold-constrained surfaces. As risk management and derivatives pricing applications require arbitrage-free, uncertainty-quantified forecasts, diffusion-based methods with principled regularization may become standard tools.
Future work should address direct hard constraint enforcement via output parameterizations, improved calibration for OTM option intervals, and extension to multivariate and cross-asset surfaces. Additionally, assessment of computational efficiency vis-a-vis large-scale production deployment (e.g., ultra-low-latency risk systems) remains an important engineering consideration.
Conclusion
This work demonstrates that conditional DDPMs, equipped with dynamic, SNR-weighted arbitrage regularization, can forecast plausible, arbitrage-free implied volatility surfaces with statistical reliability surpassing GAN-based approaches. The approach is theoretically grounded, computationally practical, and adaptable to complex market regimes, marking a significant advance for generative modeling in quantitative finance.