Papers
Topics
Authors
Recent
2000 character limit reached

Forecasting implied volatility surface with generative diffusion models (2511.07571v1)

Published 10 Nov 2025 in q-fin.CP and q-fin.MF

Abstract: We introduce a conditional Denoising Diffusion Probabilistic Model (DDPM) for generating arbitrage-free implied volatility (IV) surfaces, offering a more stable and accurate alternative to existing GAN-based approaches. To capture the path-dependent nature of volatility dynamics, our model is conditioned on a rich set of market variables, including exponential weighted moving averages (EWMAs) of historical surfaces, returns and squared returns of underlying asset, and scalar risk indicators like VIX. Empirical results demonstrate our model significantly outperforms leading GAN-based models in capturing the stylized facts of IV dynamics. A key challenge is that historical data often contains small arbitrage opportunities in the earlier dataset for training, which conflicts with the goal of generating arbitrage-free surfaces. We address this by incorporating a standard arbitrage penalty into the loss function, but apply it using a novel, parameter-free weighting scheme based on the signal-to-noise ratio (SNR) that dynamically adjusts the penalty's strength across the diffusion process. We also show a formal analysis of this trade-off and provide a proof of convergence showing that the penalty introduces a small, controllable bias that steers the model toward the manifold of arbitrage-free surfaces while ensuring the generated distribution remains close to the real-world data.

Summary

  • The paper proposes a conditional DDPM that generates arbitrage-free IV surfaces by integrating SNR-weighted arbitrage penalties into its training objective.
  • It employs a U-Net based reverse diffusion process with FiLM and sinusoidal embeddings to effectively condition on market data and temporal features.
  • Empirical results show lower MAPE, enhanced calibration, and smoother surface generation compared to traditional GAN-based methods.

Conditional Generative Diffusion Models for Arbitrage-Free Implied Volatility Surface Forecasting

Overview and Motivation

The paper proposes a conditional Denoising Diffusion Probabilistic Model (DDPM) designed for generating and forecasting arbitrage-free implied volatility (IV) surfaces. This framework addresses the inherent difficulties in modeling the IV surface manifold, which is constrained by stringent no-arbitrage conditions such as convexity in strike and monotonicity in tenor. The manifold is non-linear and low-dimensional, embedded inside a higher-dimensional space, requiring any generative model to restrict outputs to the arbitrage-free domain. Previous work relying on parametric models (e.g., Heston), GANs, or VAEs has been limited by mode collapse, unreliable uncertainty calibration, or difficulties in respecting financial constraints.

Model Architecture and Process

Data Representation and Manifold Constraints

Implied volatility surfaces are constructed for a fixed grid of moneyness and tenor (9×99 \times 9 grid), using market data subject to intensive cleaning and smoothing (vega-weighted kernel regression). Explicit no-arbitrage penalties—quantifying the extent of violations of butterfly, calendar spread, and call spread constraints—are employed both as diagnostic and as training objectives.

Forward and Reverse Diffusion Processes

The DDPM's forward process perturbs clean surface data using a Markovian chain via a pre-defined variance schedule, approaching a standard multivariate Gaussian distribution in the limit. For the volatility surface, a Variance Preserving SDE is adopted, appropriate to the surface's normalized, bounded statistics. Figure 1

Figure 1: Diffusion process of the implied volatility.

The reverse process is parameterized by a conditional U-Net, approximating the score function and utilizing both the noisy input, diffusion timestep (positional encoding via sinusoids), and a vector of conditioning variables (prior day surfaces, EWMAs of returns, VIX return). Feature-wise linear modulation (FiLM) injects the temporal and market features at multiple network levels, yielding the necessary context-awareness.

Weighted Arbitrage Penalty and SNR Scheduling

A novel feature is the direct use of the arbitrage penalty in the objective function, dynamically modulated by the estimated signal-to-noise ratio (SNR) at each diffusion step. This addresses the instability in surface estimation at high noise levels and ensures that, as the reverse process approaches the clean data manifold, arbitrage violations are minimized. This weighting provides a parameter-free adaptation, allowing the penalty to be strong where most relevant but ineffective where the denoised estimate is unreliable. Figure 2

Figure 2: Arbitrage level of processed dataset from 1996-2023.

Theoretical Guarantees

Under regularity and convexity assumptions on the data distribution and the arbitrage penalty, a convergence proof is established. The result bounds the total variation distance between the generated and empirical distributions, incorporating bias (O(λ2)O(\lambda^2)) from the arbitrage penalty and showing that, with sufficiently small λ\lambda, the model's output remains close to the data while being steered toward the arbitrage-free surface manifold.

Implementation Details

U-Net Conditioning and Sampling

The core model is a U-Net, receiving a 4×9×94 \times 9 \times 9 spatial tensor (current, short-term EWMA, long-term EWMA, and noisy surface) plus a standardized 5D market-vector (EWMA returns/volatility and VIX change). Temporal information is encoded via sinusoidal embedding. The architecture employs FiLM for context modulation, and training revolves around MSE in noise prediction combined with the weighted arbitrage penalty.

Training Stability and Deployment

Training utilizes AdamW, gradient clipping, EMA snapshots, and early stopping. The SNR-weighted arbitrage term circumvents the destabilization commonly seen in explicit regularization schemes. At inference, the EMA model is used to generate stochastic samples for the specified market context, producing both mean forecasts and calibrated confidence intervals.

Empirical Results

Accuracy, Plausibility, and Calibration

The DDPM outperforms the VolGAN benchmark across metrics: lower overall MAPE (3.0% vs. 3.7%), better calibration (confidence interval breach rates close to theoretical 10%), and visually smoother, artifact-free surfaces. GAN-based models often produce creases and artifacts around the boundaries of the surface grid, especially for short-tenor, low-moneyness points, failing to encode hard manifold constraints.

Arbitrage Analysis

Generated surfaces display arbitrage penalties closely tracking those found in the test data, attesting to the effectiveness of the loss modulation. This is notable given historical data frequently contains violation outliers. Despite the imperfect training set, the model generally outputs surfaces with plausible financial features.

Distributional Fidelity

Higher-order moment analysis and pooled sample histograms indicate substantial matching of skew and fatter tails; however, the diffusion model exhibits slightly higher kurtosis, suggesting a conservative bias towards tail risks. While a benefit for stress testing, further refinement of OTM confidence intervals is warranted to avoid over-conservatism.

Trade-offs and Guidance

The model's main trade-off is in the choice and tuning of the arbitrage penalty weight λ\lambda: large λ\lambda enforces plausibility but risks introducing bias away from the empirical distribution. The SNR-modulated schedule mitigates this, but care is still needed when extending the framework to markets where arbitrage frequency or sources differ structurally.

Comparatively, DDPMs offer considerably more stable training and more robust uncertainty quantification compared to GANs in the IV forecasting context. Direct injection of financial constraints and market context into the architecture—leveraging domain-specific diagnostics (finite-difference penalties, market EWMAs, VIX)—generates outputs well aligned with practitioner requirements and regulatory concerns.

Implications and Future Directions

The presented methodology sets precedent for the integration of domain-specific hard or soft constraints into modern generative models, particularly for financial timeseries and manifold-constrained surfaces. As risk management and derivatives pricing applications require arbitrage-free, uncertainty-quantified forecasts, diffusion-based methods with principled regularization may become standard tools.

Future work should address direct hard constraint enforcement via output parameterizations, improved calibration for OTM option intervals, and extension to multivariate and cross-asset surfaces. Additionally, assessment of computational efficiency vis-a-vis large-scale production deployment (e.g., ultra-low-latency risk systems) remains an important engineering consideration.

Conclusion

This work demonstrates that conditional DDPMs, equipped with dynamic, SNR-weighted arbitrage regularization, can forecast plausible, arbitrage-free implied volatility surfaces with statistical reliability surpassing GAN-based approaches. The approach is theoretically grounded, computationally practical, and adaptable to complex market regimes, marking a significant advance for generative modeling in quantitative finance.

Whiteboard

Video Overview

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 4 likes about this paper.