Bias from overlapping sequence sampling with stride

Investigate and quantify the biases introduced by training on overlapping time-series segments created with a fixed stride (here, segments of length 250 with stride 50) when fitting and evaluating the MMD-with-signature-kernel generative model for financial time series, and determine how these biases affect generalization and statistical validation.

Background

To obtain sufficient training data, the authors extract overlapping segments from the S&P 500 time series using a fixed stride. This increases sample count but induces dependence between training samples and may bias learning and evaluation.

They explicitly defer analysis of these biases to future work, leaving open the characterization of the nature, magnitude, and impact of such sampling-induced biases on model training and downstream tasks.

References

This type of sampling will create biases in the training data but we leave the exploration of this issue to future work.

Generative model for financial time series trained with MMD using a signature kernel (2407.19848 - Lu et al., 29 Jul 2024) in Section 5.1 (Data Preprocessing)