Papers
Topics
Authors
Recent
2000 character limit reached

Balanced Marginal and Joint Distributional Learning via Mixture Cramer-Wold Distance (2312.03307v1)

Published 6 Dec 2023 in stat.ML and cs.LG

Abstract: In the process of training a generative model, it becomes essential to measure the discrepancy between two high-dimensional probability distributions: the generative distribution and the ground-truth distribution of the observed dataset. Recently, there has been growing interest in an approach that involves slicing high-dimensional distributions, with the Cramer-Wold distance emerging as a promising method. However, we have identified that the Cramer-Wold distance primarily focuses on joint distributional learning, whereas understanding marginal distributional patterns is crucial for effective synthetic data generation. In this paper, we introduce a novel measure of dissimilarity, the mixture Cramer-Wold distance. This measure enables us to capture both marginal and joint distributional information simultaneously, as it incorporates a mixture measure with point masses on standard basis vectors. Building upon the mixture Cramer-Wold distance, we propose a new generative model called CWDAE (Cramer-Wold Distributional AutoEncoder), which shows remarkable performance in generating synthetic data when applied to real tabular datasets. Furthermore, our model offers the flexibility to adjust the level of data privacy with ease.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Handbook of mathematical functions with formulas, graphs and mathematical tables (national bureau of standards applied mathematics series no. 55). Journal of Applied Mechanics, 32:239–239, 1965.
  2. Deep quantile regression for uncertainty estimation in unsupervised and supervised lesion detection. Machine Learning for Biomedical Imaging, 1:1–23, 2022.
  3. Distributional learning of variational autoencoder: Application to synthetic data generation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  4. Jonathan T. Barron. A general and adaptive robust loss function. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4326–4334, 2017.
  5. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv, abs/1308.3432, 2013.
  6. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112:859 – 877, 2016.
  7. From optimal transport to generative modeling: the vegan cookbook. arXiv: Machine Learning, 2017.
  8. Explicitly minimizing the blur error of variational autoencoders. In The Eleventh International Conference on Learning Representations, 2023.
  9. Generating multi-label discrete patient records using generative adversarial networks. In Machine Learning in Health Care, 2017.
  10. H. Cramér and H. Wold. Some Theorems on Distribution Functions. Journal of the London Mathematical Society, s1-11(4):290–294, 10 1936.
  11. A loss function for generative neural networks based on watson’s perceptual model. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, 2020.
  12. S.R. Deans. The Radon Transform and Some of Its Applications. Dover Books on Mathematics Series. Dover Publications, 2007.
  13. Generative modeling using the sliced wasserstein distance. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3483–3491, 2018.
  14. Training generative neural networks via maximum mean discrepancy optimization. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI’15, page 258–267, Arlington, Virginia, USA, 2015. AUAI Press.
  15. Overcoming challenges of synthetic data generation. 2022 IEEE International Conference on Big Data (Big Data), pages 262–270, 2022.
  16. Calibrated multiple-output quantile regression with representation learning. J. Mach. Learn. Res., 24:24:1–24:48, 2021.
  17. Probabilistic forecasting with spline quantile function rnns. In The 22nd international conference on artificial intelligence and statistics, pages 1901–1910. PMLR, 2019.
  18. Sample complexity of sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics, 2018.
  19. Generation and evaluation of synthetic patient data. BMC Medical Research Methodology, 20, 2020.
  20. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, page 2672–2680, Cambridge, MA, USA, 2014. MIT Press.
  21. Synthetic data generation for tabular health records: A systematic review. Neurocomputing, 493:28–45, 2022.
  22. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Neural Information Processing Systems, 2017.
  23. Deep feature consistent variational autoencoder. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1133–1141, 2016.
  24. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017.
  25. Focal frequency loss for image reconstruction and synthesis. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 13899–13909, 2020.
  26. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
  27. Generative models with kernel distance in data space. Neurocomputing, 487:119–129, 2022.
  28. Generalized sliced probability metrics. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4513–4517, 2022.
  29. Sliced wasserstein auto-encoders. In International Conference on Learning Representations, 2018.
  30. The health gym: synthetic health-related datasets for the development of reinforcement learning algorithms. Scientific Data, 9, 2022.
  31. Autoencoding beyond pixels using a learned similarity metric. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1558–1566, New York, New York, USA, 20–22 Jun 2016. PMLR.
  32. Improving gan with inverse cumulative distribution function for tabular data synthesis. Neurocomputing, 456:373–383, 2021.
  33. Generative moment matching networks. In International Conference on Machine Learning, 2015.
  34. The concrete distribution: A continuous relaxation of discrete random variables. In International Conference on Learning Representations, 2017.
  35. Adversarial autoencoders. ArXiv, abs/1511.05644, 2015.
  36. Implicit discriminator in variational autoencoder. 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2019.
  37. Data synthesis based on generative adversarial networks. Proc. VLDB Endow., 11:1071–1083, 2018.
  38. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, 2014.
  39. Cramer-wold autoencoder. J. Mach. Learn. Res., 21:164:1–164:28, 2018.
  40. Student-t variational autoencoder for robust density estimation. In International Joint Conference on Artificial Intelligence, 2018.
  41. Wasserstein auto-encoders. ArXiv, abs/1711.01558, 2017.
  42. Vae with a vampprior. In International Conference on Artificial Intelligence and Statistics, pages 1214–1223. PMLR, 2018.
  43. Modeling tabular data using conditional gan. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  44. Generation and evaluation of privacy preserving synthetic health data. Neurocomputing, 416:244–255, 2020.
  45. Grouped correlational generative adversarial networks for discrete electronic health records. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 906–913, 2019.
  46. Ctab-gan: Effective table data synthesizing. In Vineeth N. Balasubramanian and Ivor Tsang, editors, Proceedings of The 13th Asian Conference on Machine Learning, volume 157 of Proceedings of Machine Learning Research, pages 97–112. PMLR, 17–19 Nov 2021.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.