Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PCF-GAN: generating sequential data via the characteristic function of measures on the path space (2305.12511v2)

Published 21 May 2023 in cs.LG

Abstract: Generating high-fidelity time series data using generative adversarial networks (GANs) remains a challenging task, as it is difficult to capture the temporal dependence of joint probability distributions induced by time-series data. Towards this goal, a key step is the development of an effective discriminator to distinguish between time series distributions. We propose the so-called PCF-GAN, a novel GAN that incorporates the path characteristic function (PCF) as the principled representation of time series distribution into the discriminator to enhance its generative performance. On the one hand, we establish theoretical foundations of the PCF distance by proving its characteristicity, boundedness, differentiability with respect to generator parameters, and weak continuity, which ensure the stability and feasibility of training the PCF-GAN. On the other hand, we design efficient initialisation and optimisation schemes for PCFs to strengthen the discriminative power and accelerate training efficiency. To further boost the capabilities of complex time series generation, we integrate the auto-encoder structure via sequential embedding into the PCF-GAN, which provides additional reconstruction functionality. Extensive numerical experiments on various datasets demonstrate the consistently superior performance of PCF-GAN over state-of-the-art baselines, in both generation and reconstruction quality. Code is available at https://github.com/DeepIntoStreams/PCF-GAN.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. A characteristic function approach to deep implicit generative modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7478–7487, 2020.
  2. Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862, 2017.
  3. Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance, pages 1–8, 2020.
  4. Privacy and synthetic datasets. Stan. Tech. L. Rev., 22:1, 2019.
  5. Lukas Biewald. Experiment tracking with weights and biases, 2020. Software available from wandb.com.
  6. Horatio Boedihardjo and Xi Geng. Sl_2 (r)-developments and signature asymptotics for planar paths with bounded variation. arXiv preprint arXiv:2009.13082, 2020.
  7. Characteristic functions of measures on geometric rough paths. The Annals of Probability, 44(6):4049–4082, 2016.
  8. Persistence paths and signature features in topological data analysis. IEEE transactions on pattern analysis and machine intelligence, 42(1):192–202, 2018.
  9. Signature moments to characterize laws of stochastic processes. Journal of Machine Learning Research, 23(176):1–42, 2022.
  10. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  11. Fast two-sample testing with analytic representations of probability measures. Advances in Neural Information Processing Systems, 28:1981–1989, 2015.
  12. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633, 2017.
  13. Functional analysis and infinite-dimensional geometry, volume 8 of CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer-Verlag, New York, 2001.
  14. Improved training of Wasserstein GANs. Advances in Neural Information Processing Systems, 30:5769–5779, 2017.
  15. Uniqueness for the signature of a path of bounded variation and the reduced path group. Annals of Mathematics, pages 109–167, 2010.
  16. Christopher R Heathcote. The integrated squared error estimation of parameters. Biometrika, 64(2):255–264, 1977.
  17. GT-GAN: General purpose time series synthesis with generative adversarial networks. Advances in Neural Information Processing Systems, 35:36999–37010, 2022.
  18. Deep signature transforms. Advances in Neural Information Processing Systems, 32:3082–3092, 2019.
  19. Neural SDEs as infinite-dimensional GANs. In International Conference on Machine Learning, pages 5453–5463. PMLR, 2021.
  20. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  21. Achim Klenke. Probability theory: a comprehensive course. Springer Science & Business Media, 2013.
  22. Testing statistical hypotheses, volume 3. Springer, 2005.
  23. Learning from the past, predicting the statistics for the future, learning an evolving system. arXiv preprint arXiv:1309.0260, 2013.
  24. MMD GAN: Towards deeper understanding of moment matching network. Advances in Neural Information Processing Systems, 30:2200–2210, 2017.
  25. Reciprocal adversarial learning via characteristic functions. Advances in Neural Information Processing Systems, 33:217–228, 2020.
  26. Path development network with finite-dimensional Lie group representation. arXiv preprint arXiv:2204.00740, 2022.
  27. Terry Lyons. Rough paths, signatures and the modelling of functions on streams. arXiv preprint arXiv:1405.4537, 2014.
  28. Terry J Lyons. Differential equations driven by rough signals. Revista Matemática Iberoamericana, 14(2):215–310, 1998.
  29. Differential equations driven by rough paths. Springer, 2007.
  30. Hyperbolic development and inversion of signature. Journal of Functional Analysis, 272(7):2933–2955, 2017.
  31. Sig-Wasserstein GANs for time series generation. In Proceedings of the Second ACM International Conference on AI in Finance, pages 1–8, 2021.
  32. Conditional sig-wasserstein gans for time series generation. arXiv preprint arXiv:2006.05421, 2020.
  33. Kalyanapuram R. Parthasarathy. Probability measures on metric spaces, volume 3 of Probability and Mathematical Statistics. Academic Press, Inc., New York-London, 1967.
  34. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  35. Differentially private aggregation of distributed time-series with transformation and encryption. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 735–746, 2010.
  36. Conditional loss and deep euler scheme for time series generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8098–8105, 2022.
  37. EWGAN: Entropy-based Wasserstein GAN for imbalanced learning. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 10011–10012, 2019.
  38. Comparison of eeg devices for eye state classification. Proc. of the AIHLS, 2014.
  39. Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN). Advances in Neural Information Processing Systems, 35:32777–32788, 2022.
  40. Hilbert space embeddings and metrics on probability measures. The Journal of Machine Learning Research, 11:1517–1561, 2010.
  41. VEEGAN: Reducing mode collapse in GANs using implicit variational learning. Advances in Neural Information Processing Systems, 30:3310–3320, 2017.
  42. Bayesian learning from sequential data using Gaussian processes with signature covariances. In International Conference on Machine Learning, pages 9548–9560. PMLR, 2020.
  43. COT-GAN: Generating sequential data via causal optimal transport. Advances in Neural Information Processing Systems, 33:8798–8809, 2020.
  44. The unusual effectiveness of averaging in GAN training. In International Conference on Learning Representations, 2019.
  45. Time-series generative adversarial networks. Advances in Neural Information Processing Systems, 32:5509–5519, 2019.
  46. Kôsaku Yosida. Functional analysis. Sixth edition, volume 123 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin-New York, 1980.
  47. Cautionary tales on air-quality improvement in beijing. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2205):20170457, 2017.
  48. Robert J Zimmer. Essential results of functional analysis. University of Chicago Press, 1990.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hang Lou (3 papers)
  2. Siran Li (49 papers)
  3. Hao Ni (43 papers)
Citations (10)

Summary

Overview of "PCF-GAN: Generating Sequential Data via the Characteristic Function of Measures on the Path Space"

The paper introduces the PCF-GAN, a novel generative adversarial network (GAN) that employs the Path Characteristic Function (PCF) to enhance the generation of high-fidelity time series data, which has proven challenging due to the necessity of capturing temporal dependencies within joint probability distributions. The PCF-GAN aims to robustly capture these dependencies by representing time series distributions in a discriminator, thus improving generative performance.

Key innovations of PCF-GAN include leveraging theoretical properties of PCF, such as characteristicity, boundedness, differentiability, and weak continuity to ensure training stability and feasibility. This is supplemented with specific initialization and optimization techniques designed to enhance the GAN's discriminative power. Additionally, an auto-encoder structure is integrated to manage complex time series by providing reconstruction capabilities through sequential embedding.

Theoretical Contributions

The paper thoroughly establishes the theoretical foundation of PCF, examining its essential properties like boundedness and differentiability concerning generator parameters. One of the notable theoretical insights is demonstrating that PCF distance extends the Integral Probability Metric (IPM) approach, traditionally used in GAN literature, such as those relying on Wasserstein distances or Maximum Mean Discrepancy (MMD).

By utilizing the unitary feature of paths within the context of rough path theory, the authors effectively manage challenges posed by the infinite-dimensionality of path space. Such an approach generalizes classical theorems regarding measures on finite-dimensional spaces—especially valuable when dealing with continuous time perspectives of time series.

Empirical Performance

Empirically, PCF-GAN demonstrates superior performance compared to existing generative models when tested on various datasets, indicating its robustness in terms of generation and reconstruction quality. The paper reports significant empirical results affirming this advantage, particularly in datasets exhibiting complex temporal dependencies.

Implications and Future Directions

The use of PCF in GAN architectures offers promising avenues for both theoretical exploration and practical application. On a theoretical level, its adaptability to path space metrics could pave the way for novel methodologies in statistics and machine learning. Practically, PCF-GAN serves diverse applications, including privacy-preserving synthetic data generation, which is crucial for sectors like healthcare and finance where real-world data may be sensitive.

Future research could explore further integration of PCF-based distance metrics in various GAN discriminator designs, potentially yielding improved performance across broader applications. Incorporating advances in sequential models, such as transformers within PCF-GAN's auto-encoder architecture, may also enhance generation capabilities for more complex data inputs, such as video or high-dimensional sensor data.

Conclusively, PCF-GAN represents a significant step towards generating realistic sequential data by systematically addressing core challenges related to temporal dependencies within time series, supported by a strong theoretical framework and validated by empirical success.

Youtube Logo Streamline Icon: https://streamlinehq.com