Papers
Topics
Authors
Recent
Search
2000 character limit reached

High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework

Published 26 May 2026 in cs.LG and cs.AI | (2605.27113v1)

Abstract: In recent years, financial institutions and firms have increasingly adopted synthetic data to address data scarcity and to generate counterfactual market scenarios. However, reproducing all the statistical properties of financial time series, commonly known as stylized facts, remains an open challenge for many existing general-purpose architectures. In this paper, we present a quality-aware generative framework that combines two classes of generative methods, demonstrating how their integration addresses existing limitations while enhancing the realism of synthetic data. Specifically, we first introduce CoMeTS-GAN (Correlated Multivariate Time Series GAN), a Conditional Generative Adversarial Network (C-GAN) designed to jointly generate mid-price and volume time-series for correlated stocks. We then show how our GAN architecture can be incorporated into state-of-the-art diffusion models to enhance the quality of generated correlation structures. Specifically, the GAN's Critic serves as a quality evaluation module that guides the diffusion process, enforcing learned correlation structures in the generated time-series. Our framework offers a lightweight and responsive solution for realistic stock market simulation, explicitly modeling inter-asset correlation structures. We experimentally validate our framework against leading generative architectures, showing that it more effectively captures the stylized facts of stock markets and models inter-asset correlations.

Summary

  • The paper presents a combined C-WGAN and diffusion framework that generates high-fidelity synthetic financial time-series with explicit cross-asset correlation modeling.
  • The methodology employs temporal convolution, spectral normalization, and a critic-guided loss to enforce realism and preserve key market stylized facts.
  • Empirical results confirm the framework's ability to reproduce heavy-tailed returns, volatility clustering, and realistic dependency structures across multiple assets.

High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework

Overview of the Proposed Framework

The paper introduces a generative framework for multivariate financial time-series synthesis, leveraging a combination of Conditional Wasserstein GANs (C-WGAN) and diffusion models. The goal is to produce synthetic data that accurately reproduces stylized facts of financial markets—such as heavy-tailed returns, volatility clustering, and realistic cross-asset correlation structures—supporting applications in simulation, risk assessment, hedging, and counterfactual scenario analysis.

The core architecture consists of a C-WGAN where both the generator and critic are explicitly conditioned on past market data as well as temporal information. The critic evaluates not just sample realism but also quantitatively measures cross-correlation fidelity across assets and features, providing a domain-specific constraint to facilitate the learning of salient correlation dynamics.

Additionally, the critic is repurposed as a guidance module within diffusion models, enabling quality-aware, critic-guided generation at inference. This integration promotes superior realism in the correlation structures, addressing limitations observed in existing general-purpose time-series generation models. Figure 1

Figure 1: Overview of the framework: a conditional GAN generates correlated multivariate time-series, with its critic guiding a diffusion model to enforce realistic cross-asset dependencies.

Conditional GAN Architecture and Correlation Modeling

The C-WGAN architecture employs temporal convolutional layers and spectral normalization for both generator and critic. The generator receives concatenated inputs of past sequences and noise, with time embeddings injected via sinusoidal position-encoding and processed by a two-layer MLP, enhancing temporal awareness and diversity of outputs. This design supports the generation of arbitrarily long sequences via autoregressive sampling.

The critic’s output combines standard realism scoring with an explicit cross-correlation distance term, computed via mean squared error between empirical and synthetic Pearson correlation coefficients for all asset and feature pairs. This cross-correlation distance is incorporated directly in the loss, weighted by a hyperparameter, allowing the model to optimize for realistic joint distribution and dependency structure. Figure 2

Figure 2: C-WGAN architecture: generator and critic modules with dedicated correlation evaluation layer for multivariate time-series.

Stylized Facts and Empirical Validation

Extensive experiments are conducted on various datasets, including synthetic benchmarks (sine waves, multivariate Gaussians) and real LOBSTER mid-price and volume traces for NASDAQ stocks such as KO, PEP, NVDA, and KSU. Evaluation metrics comprise discriminative scores (adversarial realism), diversity (mode collapse avoidance), and preservation of stylized facts: heavy-tailed distributions, aggregational normality, absence of autocorrelation, volatility clustering, and volume-volatility correlation.

The model maintains diversity across generated traces (Figure 3) and achieves negligible cross-correlation distance after training (Figure 4), indicating successful correlation structure modeling. Generated price and volume data show preservation of realistic correlation patterns, as evidenced by KO and PEP maintaining expected positive correlation, and volume curves reproducing characteristic U-shaped profiles (Figures 6 and 7). Figure 3

Figure 3: Diversity in price generation: synthetic traces exhibit variation across multiple runs while preserving statistical structure.

Figure 4

Figure 4: Average cross-correlation distance during training: convergence to low values signals accurate dependency modeling.

Figure 5

Figure 5: Price - Correlation between KO and other stocks: synthetic series reflect true dependency patterns seen in empirical data.

Figure 6

Figure 6: Real and synthetic volumes: U-shaped intraday volume pattern is partially reproduced in generated sequences.

Key stylized facts are quantitatively matched, with fat-tailed log-return distributions (Figure 7), clustering of volatility autocorrelations (Figure 8), and robust volume-volatility correlations (Figure 9). Aggregational normality and absence of autocorrelation are also observed, reinforcing the model’s effectiveness in high-fidelity replication of financial statistics. Figure 7

Figure 7

Figure 7: Intraday log-return distribution (Δt=1\Delta t=1): generated sequences capture empirical heavy-tail behavior.

Figure 8

Figure 8: Correlation coefficients of volatility at increasing day lag: synthetic series match the empirical volatility autocorrelation decay.

Figure 10

Figure 10: Pairwise correlation distributions of daily asset prices (390 minutes): synthetic multivariate series closely reproduce real market correlation structures.

Figure 9

Figure 9: Distributions of volume-volatility correlation coefficients over windows of two days: superior accuracy compared to alternative GAN approaches.

Comparative Analysis and Scalability

A comprehensive benchmark against state-of-the-art models (COSCI-GAN, GT-GAN, TimeGAN, RCGAN, TTS-GAN) reveals that while certain models demonstrate competitive realism scores on literature benchmarks, the C-WGAN with cross-correlation loss (“with cross-corr.”) excels in modeling financial dependencies and stylized facts, especially for correlation-sensitive applications.

Numerical results indicate that the critic-guided correlation loss term substantially improves performance, especially in settings with high inter-feature correlation, outperforming general-purpose models in realistic financial synthetic data creation.

The model demonstrates linear scalability with respect to the number of assets, successfully generating concurrent traces for all 30 DJIA components (Figure 11), confirming its application viability in large-scale risk simulations, portfolio scenarios, and market impact analyses. Figure 11

Figure 11: Concurrent price generation for the 30 DJIA components: scalable architecture supports high-dimensional synthetic data generation.

Critic-Guided Diffusion and Counterfactual Scenarios

The framework further leverages the critic as a guidance signal for diffusion models, modifying the score function to bias generation toward regions of higher critic-evaluated realism. Empirical results show that critic-guided diffusion significantly improves synthetic correlation distributions, reducing Wasserstein distances to empirical benchmarks (Table in Figure 10, and Table in Figure 12).

Additionally, the architecture supports simulation of counterfactual events: perturbations in a stock (KO) propagate to correlated assets (PEP) in generated yields, illustrating the model’s capacity to preserve inter-stock dependencies under interventions (Figures 14 and 15). Reversing guidance enables generation of scenarios with intentionally altered correlation structures, supporting rigorous stress-testing and “what-if” market analyses. Figure 13

Figure 13: Price evolution of KO (top) and PEP (bottom): perturbation induces correlated response, validating model reactivity.

Figure 12

Figure 12: Correlation between KO and PEP varying intensity of perturbation: induced shocks propagate realistically in synthetic series.

Implications and Future Directions

The work implies significant advancements in financial synthetic data generation, providing a lightweight, scalable, and correlation-sensitive solution for simulation and modeling tasks. The critic-guided integration with diffusion models points to a promising direction for quality-aware conditional generative modeling, supporting both realistic and adversarial scenario simulation.

Practically, the framework facilitates robust back-testing, calibration, and risk management in environments where proprietary or rare-event data is absent. Theoretically, the explicit modeling of correlation dynamics and stylized facts establishes a baseline for further exploration of domain-aware GAN-Diffusion combinations in time-series synthesis. Potential future developments include adaptation for other asset classes, exploration of dynamic conditioning for online simulation, and further automated constraint integration for stylized facts preservation.

Conclusion

This paper presents a quality-aware generative framework combining conditional GANs and diffusion models, uniquely tailored for financial time-series with explicit cross-asset dependency modeling. Empirical validation establishes fidelity to stylized facts, scalability, reactivity, and counterfactual simulation capability, outperforming state-of-the-art models in financial realism and correlation structure modeling. The critic-guidance methodology further enhances diffusion-based sampling quality, establishing a foundation for future work in domain-specific synthetic time-series generation and quality-aware simulation tools for financial and economic research.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 6 likes about this paper.