Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Survey on Generative Diffusion Models for Structured Data (2306.04139v2)

Published 7 Jun 2023 in cs.LG and cs.AI

Abstract: In recent years, generative diffusion models have achieved a rapid paradigm shift in deep generative models by showing groundbreaking performance across various applications. Meanwhile, structured data, encompassing tabular and time series data, has been received comparatively limited attention from the deep learning research community, despite its omnipresence and extensive applications. Thus, there is still a lack of literature and its reviews on structured data modelling via diffusion models, compared to other data modalities such as visual and textual data. To address this gap, we present a comprehensive review of recently proposed diffusion models in the field of structured data. First, this survey provides a concise overview of the score-based diffusion model theory, subsequently proceeding to the technical descriptions of the majority of pioneering works that used structured data in both data-driven general tasks and domain-specific applications. Thereafter, we analyse and discuss the limitations and challenges shown in existing works and suggest potential research directions. We hope this review serves as a catalyst for the research community, promoting developments in generative diffusion models for structured data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (114)
  1. Forecasting stock index price using the ceemdan-lstm model. The North American Journal of Economics and Finance, 57:101421, 2021.
  2. Credit card fraud detection using machine learning: a study. arXiv preprint arXiv:2108.10005, 2021.
  3. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 5941–5948, 2019.
  4. Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease). Journal of biomedical informatics, 97:103257, 2019.
  5. A practical guide to counterfactual estimators for causal inference with time-series cross-sectional data. American Journal of Political Science, 2022.
  6. Agcn: Adversarial graph convolutional network for 3d point cloud segmentation. In British Machine Vision Conference, 2021.
  7. Solving inverse problems in medical imaging with score-based generative models. In NeurIPS 2021 Workshop on Deep Learning and Inverse Problems, 2021.
  8. Surf: Semantic-level unsupervised reward function for machine translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4508–4522, 2022.
  9. A multi-task based neural model to simulate users in goal oriented dialogue systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2115–2119, 2022.
  10. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2426–2435, June 2022.
  11. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10696–10706, 2022.
  12. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022.
  13. Condita: A state machine like architecture for multi-modal task bots. In Alexa Prize TaskBot Challenge Proceedings, 2022.
  14. Latent diffusion energy-based model for interpretable text modeling. arXiv preprint arXiv:2206.05895, 2022.
  15. Diffusion posterior sampling for general noisy inverse problems. In The Eleventh International Conference on Learning Representations, 2023.
  16. Deep learning. nature, 521(7553):436–444, 2015.
  17. Do we really need deep learning models for time series forecasting? arXiv preprint arXiv:2101.02118, 2021.
  18. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 2022.
  19. Well-tuned simple nets excel on tabular datasets. Advances in neural information processing systems, 34:23928–23941, 2021.
  20. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  21. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  22. Generating privacy-preserving synthetic tabular data using oblivious variational autoencoders. In Proceedings of the Workshop on Economics of Privacy and Data Labor at the 37 th International Conference on Machine Learning, 2020.
  23. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633, 2017.
  24. Quant gans: deep generation of financial time series. Quantitative Finance, 20(9):1419–1440, 2020.
  25. Gain: Missing data imputation using generative adversarial nets. In International conference on machine learning, pages 5689–5698. PMLR, 2018.
  26. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  27. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  28. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  29. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
  30. Mdm: Molecular diffusion model for 3d molecule generation. arXiv preprint arXiv:2209.05710, 2022.
  31. Geometric latent diffusion models for 3d molecule generation. arXiv preprint arXiv:2305.01140, 2023.
  32. Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796, 2022.
  33. A survey on generative diffusion model. arXiv preprint arXiv:2209.02646, 2022.
  34. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  35. Yuansong Zhu and Yu Zhao. Diffusion models in nlp: A survey. arXiv preprint arXiv:2303.07576, 2023.
  36. A survey on graph diffusion models: Generative ai in science for molecule, protein and material. arXiv preprint arXiv:2304.01565, 2023.
  37. Diffusion models for medical image analysis: A comprehensive survey, 2023.
  38. Diffusion models for time series applications: A survey. arXiv preprint arXiv:2305.00624, 2023.
  39. Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, 34:24804–24816, 2021.
  40. Digress: Discrete denoising diffusion for graph generation. arXiv preprint arXiv:2209.14734, 2022.
  41. Argmax flows and multinomial diffusion: Learning categorical distributions. Advances in Neural Information Processing Systems, 34:12454–12465, 2021.
  42. Tabddpm: Modelling tabular data with diffusion models. arXiv preprint arXiv:2209.15421, 2022.
  43. Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(24):695–709, 2005.
  44. Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
  45. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, pages 574–584. PMLR, 2020.
  46. Sos: Score-based oversampling for tabular data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 762–772, 2022.
  47. Stasy: Score-based tabular data synthesis. arXiv preprint arXiv:2210.04018, 2022.
  48. Codi: Co-evolving contrastive diffusion models for mixed-type tabular synthesis. arXiv preprint arXiv:2304.12654, 2023.
  49. Diffusion models for missing value imputation in tabular data. arXiv preprint arXiv:2210.17128, 2022.
  50. Regular time-series generation using sgm. arXiv preprint arXiv:2301.08518, 2023.
  51. Juan Miguel Lopez Alcaraz and Nils Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models. arXiv preprint arXiv:2208.09399, 2022.
  52. Modeling temporal data as continuous functions with process diffusion. arXiv preprint arXiv:2211.02590, 2022.
  53. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, pages 8857–8868. PMLR, 2021.
  54. Scoregrad: Multivariate probabilistic time series forecasting with continuous energy-based generative models. arXiv preprint arXiv:2106.10121, 2021.
  55. Generative time series forecasting with diffusion, denoise, and disentanglement. Advances in Neural Information Processing Systems, 35:23009–23022, 2022.
  56. Ehrdiff: Exploring realistic ehr synthesis with diffusion models. arXiv preprint arXiv:2303.05656, 2023.
  57. Synthesizing mixed-type electronic health records using diffusion models. arXiv preprint arXiv:2302.14679, 2023.
  58. Meddiff: Generating electronic health records using accelerated denoising diffusion model. arXiv preprint arXiv:2302.04355, 2023.
  59. Synthetic health-related longitudinal data with mixed-type variables generated using diffusion models. arXiv preprint arXiv:2303.12281, 2023.
  60. Tdstf: Transformer-based diffusion probabilistic model for sparse time series forecasting. arXiv preprint arXiv:2301.06625, 2023.
  61. Juan Miguel Lopez Alcaraz and Nils Strodthoff. Diffusion-based conditional ecg generation with structured state space models. arXiv preprint arXiv:2301.08227, 2023.
  62. Descod-ecg: Deep score-based diffusion model for ecg baseline wander and noise removal. IEEE Journal of Biomedical and Health Informatics, 2023.
  63. Domain-specific denoising diffusion probabilistic models for brain dynamics. arXiv preprint arXiv:2305.04200, 2023.
  64. Recommendation via collaborative diffusion generative model. In Knowledge Science, Engineering and Management: 15th International Conference, KSEM 2022, Singapore, August 6–8, 2022, Proceedings, Part III, pages 593–605. Springer, 2022.
  65. Diffurec: A diffusion model for sequential recommendation. arXiv preprint arXiv:2304.00686, 2023.
  66. Sequential recommendation with diffusion models. arXiv preprint arXiv:2304.04541, 2023.
  67. Diffusion recommender model. arXiv preprint arXiv:2304.04971, 2023.
  68. Conditional denoising diffusion for sequential recommendation. arXiv preprint arXiv:2304.11433, 2023.
  69. Modeling tabular data using conditional gan. Advances in Neural Information Processing Systems, 32, 2019.
  70. Improving missing data imputation with deep generative models. arXiv preprint arXiv:1902.10666, 2019.
  71. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  72. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  73. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015.
  74. Analog bits: Generating discrete data using diffusion models with self-conditioning. arXiv preprint arXiv:2208.04202, 2022.
  75. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34:18932–18943, 2021.
  76. The use of arima models for reliability forecasting and analysis. Computers & industrial engineering, 35(1-2):213–216, 1998.
  77. Recurrent neural networks for time series forecasting: Current status and future directions. International Journal of Forecasting, 37(1):388–427, 2021.
  78. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021.
  79. Brits: Bidirectional recurrent imputation for time series. Advances in neural information processing systems, 31, 2018.
  80. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  81. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
  82. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
  83. Nvae: A deep hierarchical variational autoencoder. Advances in neural information processing systems, 33:19667–19679, 2020.
  84. Neural empirical bayes. arXiv preprint arXiv:1903.02334, 2019.
  85. Disentangling by factorising. In International Conference on Machine Learning, pages 2649–2658. PMLR, 2018.
  86. Behrt: transformer for electronic health records. Scientific reports, 10(1):1–12, 2020.
  87. Personalizing medication recommendation with a graph-based approach. ACM Transactions on Information Systems (TOIS), 40(3):1–23, 2021.
  88. Clinical requirements of future patient monitoring in the intensive care unit: qualitative study. JMIR medical informatics, 7(2):e13064, 2019.
  89. Legal and ethical consideration in artificial intelligence in healthcare: who takes responsibility? Frontiers in surgery, page 266, 2022.
  90. Synthesizing electronic health records using improved generative adversarial networks. Journal of the American Medical Informatics Association, 26(3):228–241, 2019.
  91. Donald G Anderson. Iterative procedures for nonlinear integral equations. Journal of the ACM (JACM), 12(4):547–560, 1965.
  92. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  93. Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364, 2022.
  94. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  95. Evaluation of bioelectric signals for human recognition. Procedia Computer Science, 48:746–752, 2015.
  96. Use of the electrocardiogram in acute myocardial infarction. New England Journal of Medicine, 348(10):933–940, 2003.
  97. Review of noise removal techniques in ecg signals. IET Signal Processing, 14(9):569–590, 2020.
  98. Synthesis of realistic ecg using generative adversarial networks. arXiv preprint arXiv:1909.09150, 2019.
  99. Deepfilter: An ecg baseline wander removal filter using deep learning techniques. Biomedical Signal Processing and Control, 70:102992, 2021.
  100. A database for evaluation of algorithms for measurement of qt and other waveform intervals in the ecg. In Computers in cardiology 1997, pages 673–676. IEEE, 1997.
  101. A noise stress test for arrhythmia detectors. Computers in cardiology, 11(3):381–384, 1984.
  102. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699, 2019.
  103. Review of the bci competition iv. Frontiers in neuroscience, page 55, 2012.
  104. Artificial intelligence in recommender systems. Complex & Intelligent Systems, 7:439–457, 2021.
  105. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 549–558, 2016.
  106. Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1441–1450, 2019.
  107. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference, pages 689–698, 2018.
  108. Sequential recommendation with self-attentive multi-adversarial network. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pages 89–98, 2020.
  109. Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Representations, 2022.
  110. Causal inference for time series analysis: Problems, methods and evaluation. Knowledge and Information Systems, 63:3041–3085, 2021.
  111. Interpretability and fairness evaluation of deep learning models on mimic-iv dataset. Scientific Reports, 12(1):7166, 2022.
  112. A stochastic time series model for predicting financial trends using nlp. arXiv preprint arXiv:2102.01290, 2021.
  113. Table-to-text generation by structure-aware seq2seq learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  114. Fusion of sequential visits and medical ontology for mortality prediction. Journal of Biomedical Informatics, 127:104012, 2022.
Citations (5)

Summary

We haven't generated a summary for this paper yet.