Synthetic Returns in Quantitative Finance

Updated 5 March 2026

Synthetic returns are artificially generated asset return sequences produced by generative models and stochastic processes to replicate key statistical patterns observed in financial markets.
They enable privacy-preserving experimentation, stress testing, and algorithm development by extending historical datasets and reproducing features like heavy tails and volatility clustering.
Utilizing approaches like deep generative models and multifractal processes, synthetic returns facilitate robust portfolio optimization, risk assessment, and algorithmic training in quantitative finance.

Synthetic returns are artificially generated sequences or distributions of asset returns that serve as substitutes for historical data in applications such as portfolio construction, risk modeling, statistical arbitrage, and credit assignment. These constructs are central across financial econometrics, machine learning for market data, reinforcement learning, and decentralized finance, enabling privacy-preserving quantitative experimentation, robust scenario analysis, and algorithmic innovation in environments with restricted data availability or long-term informational dependencies.

1. Conceptual Framework and Motivations

Synthetic returns are defined as time series or cross-sectional samples of asset returns produced by generative models, parametric stochastic processes, or algorithmic transformations, with the aim to replicate the key statistical, temporal, and economic features observed in real financial markets. Their utility derives from:

Data accessibility and privacy: Synthetic datasets facilitate research and experimentation without risking commercial or privacy-sensitive exposure of proprietary historical returns (Hounwanou et al., 25 Dec 2025).
Augmentation and stress testing: They extend the effective sample space for rare event modeling, stress scenarios, and sensitivity analysis—particularly valuable in risk management workflows and regulatory regimes (Cetingoz et al., 7 Jan 2025).
Algorithm development: Synthetic returns provide reproducible, modifiable environments for training and benchmarking trading strategies, portfolio optimizations, or reinforcement learning agents facing long-term credit assignment challenges (Raposo et al., 2021).

Critically, the effectiveness of synthetic returns relies on their fidelity to "stylized facts"—heavy tails, volatility clustering, temporal dependencies, and cross-sectional correlation structures characterizing real asset returns.

2. Generative Modeling Methodologies

State-of-the-art approaches for generating synthetic return series can be classified into deep generative models, multifractal or parametric processes, and algorithmic trading constructs:

A. Deep Generative Models

TimeGAN: Combines an embedding autoencoder with a generator/discriminator (adversarial) architecture, capturing both marginal and dynamic features of return sequences. Embedding and recovery networks ( $E$ , $R$ ) encode and decode real trajectories; generator ( $G$ ) and discriminator ( $D$ ) enforce indistinguishability between real and generated returns in both latent and data space. The overall loss is:

$\mathcal{L}_{\text{TimeGAN}} = \mathcal{L}_{\text{rec}} + \alpha_{\text{sup}}\mathcal{L}_{\text{sup}} + \alpha_{\text{adv}}\mathcal{L}_{\text{adv}}$

where $\mathcal{L}_{\text{adv}}$ is the adversarial loss, $\mathcal{L}_{\text{sup}}$ aligns latent codes, and $\mathcal{L}_{\text{rec}}$ ensures autoencoding fidelity (Hounwanou et al., 25 Dec 2025).

Variational Autoencoders (VAEs) and Normalizing Flows: VAEs map sequences to stochastic latent codes ( $q_\phi(\mathbf{z}\mid \mathbf{r}_{1:T})$ ) and decode into returns, optimizing the evidence lower bound (ELBO); normalizing flows further enable tractable high-dimensional density estimation and sampling (Tepelyan et al., 2023). Conditional models capture cross-asset dependencies and volatility-leverage structure.

B. Multifractal and Parametric Stochastic Processes

Multifractal Random Walk (MRW): Generates returns as

$X_n = \sum_{k=1}^n \epsilon_k e^{\omega_k}$

with $\epsilon_k$ i.i.d. Gaussian or Student-t innovations and $\omega_k$ a log-volatility Gaussian process governed by a cascade-like covariance structure. MRW exactly reproduces multifractal scaling laws and tail exponents consistent with empirically observed return distributions (Morales et al., 2012).

Skewed Heavy-Tailed Parametric Distributions: The modified Jones-Faddy skew-t (mJF1) distribution permits asymmetry between gains and losses by assigning different stochastic volatility parameters on positive and negative returns, matching empirical mean, skewness, and asymmetric tail decay (Shao et al., 29 Dec 2025).

C. Synthetic Derivatives Constructs

Synthetic-Long Positions and Statistical Arbitrage: Construction of synthetic returns in option markets via put-call parity and synthetic long/short-zero-coupon bonds. The SLSA (Synthetic-Long-Short-Arbitrage) portfolios systematically isolate and exploit pricing deviations, guaranteed to be neutral to all Black-Scholes risk factors in no-arbitrage regimes (Hong et al., 20 Aug 2025).

D. Synthetic Returns in Algorithmic Reinforcement Learning

State-Associative (SA) Learning: In reinforcement learning, synthetic returns are model-based estimates attributing future reward directly to particular states, bypassing the incremental, noisy temporal-difference propagation used in standard TD learning. The SA model decomposes observed rewards as

$\hat{r}_t = b(s_t) + g(s_t) \cdot \sum_{k=0}^{t-1} c(s_k)$

where $c(s_k)$ is the estimated synthetic return for state $s_k$ (Raposo et al., 2021).

3. Evaluation Techniques and Metrics

Synthetic return generators are validated by multilayered quantitative metrics:

Statistical similarity: Metrics such as Kolmogorov–Smirnov statistic, 1-Wasserstein distance, KL and JS divergences between empirical and synthetic distributions (Hounwanou et al., 25 Dec 2025).
Temporal structure: Autocorrelation functions (ACF), dynamic time warping (DTW) distance, and volatility clustering indicators are used to quantify time-lagged dependencies (Tepelyan et al., 2023).
Multifractal scaling: Weighted generalized Hurst exponent (wGHE) assesses scaling exponents and their stationarity; exceedance rates in confidence intervals for $\Delta H^w(q_1, q_2)$ signal nonstationary multifractality (Morales et al., 2012).
Tail risk: Empirical VaR, ES, sample kurtosis, and Hill tail-index estimates diagnose risk-relevant tail behavior (Cetingoz et al., 7 Jan 2025).
Portfolio performance: Synthetic-data-derived optimal portfolio weights, Sharpe ratios, and realized risk metrics are compared to real-data outcomes as a robust, application-centered validity check (Hounwanou et al., 25 Dec 2025, Tepelyan et al., 2023).

4. Applications in Portfolio, Risk, and Derivatives Modeling

Synthetic returns are employed across a spectrum of quantitative finance workflows:

Portfolio optimization and risk backtesting: Synthetic S&P 500 returns from TimeGAN and VAE yield mean-variance optimal portfolio weights within 1–2 percentage points of real data; synthetic-data Sharpe ratios, volatility, VaR, and ES metrics largely track real-series benchmarks (TimeGAN: volatility 1.30%, VaR ₀.₉₅=–2.05%, ES ₀.₉₅=–2.79%) (Hounwanou et al., 25 Dec 2025).
Model risk and identifiability: Advanced evaluation involves retraining the synthetic-data generator on its own output (regurgitative or identifiability testing) and comparing strategy-relevant metrics (e.g., Sharpe-vs-horizon profiles) to ensure model soundness for application (Cetingoz et al., 7 Jan 2025).
Statistical arbitrage in options markets: ML-derived synthetic arbitrage signals enable construction of SLSA portfolios that are provably minimal risk and Greek-neutral, consistently delivering statistically significant positive returns (annualized information ratio 0.1627) (Hong et al., 20 Aug 2025).
DeFi synthetic returns: On-chain synthetic assets, whose price movements are pegged algorithmically to reference assets via oracles and arbitrage, define realized synthetic returns as the sum of raw price change, funding yields, collateral costs, and operational fees, adjusted for leverage and risk of liquidations (Rahman et al., 2022, Meister et al., 2022).
Reinforcement learning with long-term credit assignment: Synthetic returns as auxiliary reward signals in SA-learning dramatically accelerate RL agent convergence where standard TD fails (e.g., Atari Skiing: 25× faster solution vs. prior SOTA) and produce interpretable assignment of reward causality to critical states (Raposo et al., 2021).

5. Caveats and Structural Limitations

Several structural limitations and methodological caveats are established:

Finite-sample bias: Generating large synthetic datasets cannot overcome model bias from limited initial training data. Berry–Esseen-type results formalize the impossibility of bias reduction by oversampling. Best practice is to keep synthetic sample sizes on the order of the original data (Cetingoz et al., 7 Jan 2025).
Portfolio sensitivity paradox: Generic GAN/VAE models focus on high-variance principal components, but mean-variance and especially long-short portfolios are most sensitive to low-variance directions, leading to potentially poor synthetic risk or Sharpe estimation in arbitrage or market-neutral strategies unless modeling explicitly targets these subspaces (Cetingoz et al., 7 Jan 2025).
Model risk under extreme events: Even sophisticated joint generators (e.g., conditional flows + CIWAEs) show excess correlations and calibration degradation during structural market breaks (e.g., COVID crash), indicating the need for regime-switching or exogenous driver extension (Tepelyan et al., 2023).
Hybrid/transparent schemes for regulated settings: Transparency and interpretability requirements recommend hybrid approaches (e.g., GANs with copula constraints, differentially private VAEs) for deployment in formally regulated financial environments (Hounwanou et al., 25 Dec 2025).

6. Research Directions and Best Practices

Current research systematically advocates:

Multi-angle validation and identifiability: No single statistical test suffices; thorough validation combines marginal, temporal, and downstream-application-based testing, including regurgitative retraining and task-based performance metrics (Cetingoz et al., 7 Jan 2025).
Fidelity to stylized facts: Generators must reproduce fat tails, volatility clustering, leverage effects, and (where appropriate) multifractality. Purpose-built architectures (e.g., TCNs for factors, Student-t mixtures for residuals) ensure these properties (Tepelyan et al., 2023, Morales et al., 2012).
Application-specific generator design: Generator structure and loss functions should reflect the sensitivities of the target downstream application, not only statistical divergence from training data; particular attention should be paid to risk metrics and the directions of greatest portfolio exposure (Cetingoz et al., 7 Jan 2025).

7. Summary Table: Key Models for Synthetic Returns

Model/Approach	Main Feature	Optimal Use Case
TimeGAN	Temporal/marginal fidelity, GAN+AE hybrid	Equity & multi-asset time series
VAE / Normalizing Flow	Stable inference, explicit likelihood	Cross-sectional return modeling
MRW	Multifractal scaling, tail control	Volatility and scaling studies
mJF1 Skew-t	Asymmetric tail specification	Heavy-tailed, skewed returns
SLSA (RNConv)	Pure-arb, Greek-neutral portfolios	Options stat arb / pricing errors
SA-Learning	Direct distant credit assignment (RL)	RL w/ long delays
DeFi synthetic assets	Smart-contract realized return decomposition	Yield, on-chain protocol analysis

These methodologies provide technical and practical frameworks for the generation, validation, and deployment of synthetic returns, facilitating reproducible, robust, and privacy-preserving innovation in financial modeling and algorithmic research (Hounwanou et al., 25 Dec 2025, Cetingoz et al., 7 Jan 2025, Morales et al., 2012, Hong et al., 20 Aug 2025, Tepelyan et al., 2023, Shao et al., 29 Dec 2025, Raposo et al., 2021).