Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data (2404.15821v1)

Published 24 Apr 2024 in cs.LG and cs.PF
SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data

Abstract: With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper, we describe SynthEval and illustrate its versatility with examples. The framework facilitates better benchmarking and more consistent comparisons of model capabilities.

Evaluating the Utility and Privacy of Synthetic Tabular Data with SynthEval

Background and Motivation

The creation of synthetic data has gained popularity as an alternative to real-world datasets for purposes that require privacy preservation, such as in sensitive fields like healthcare. Synthetic data can emulate real-world properties without compromising personal privacy, making it a valuable resource in data science. However, the challenge lies in adequately evaluating both the utility and privacy of synthetic data to ensure its effectiveness and safety. Addressing this need, the paper introduces SynthEval, a comprehensive Python-based open-source framework that evaluates synthetic tabular data across numerous dimensions, including utility and privacy.

Related Work in Evaluation Frameworks

The paper discusses existing evaluation tools like SynthCity and SDmetrics, highlighting issues around ease of use and adaptability, particularly when handling mixed-type data (numerical and categorical). Most tools offer limited customization and require extensive setup, which constrains their effectiveness across diverse datasets. SynthEval addresses these limitations by providing a flexible, easily extendable tool with built-in capabilities to handle mixed data types, revealing a significant enhancement over previous methods.

SynthEval Framework Description

SynthEval distinguishes itself with several innovative features. Key among these is its dual capability to evaluate both privacy and utility through a variety of metrics. The framework allows for extensive customization, enabling users to tailor evaluations based on specific needs.

  • Metrics Customization: Includes 30 metrics, supporting evaluations that prioritize aspects like correlation differences, mutual information, and identifiable risk. Each metric is adaptable, catering to various data types and evaluation goals.
  • Benchmarking and Extensibility: Users can benchmark multiple datasets simultaneously with results aggregated in a comprehensible format that ranks synthetic datasets against key metrics. Additionally, SynthEval is designed for easy integration of new custom metrics.
  • Detailed Reporting: Beyond numeric summaries, SynthEval can generate detailed reports and visual aids to assist in interpreting evaluation results, aiding stakeholders in making informed decisions about synthetic data usability.

Privacy and Utility Trade-offs

The application of SynthEval is illustrated through an example that involved generating synthetic data via several methods including GANs and Bayesian networks. By leveraging SynthEval, researchers could discern not only the utility and privacy levels of each method but also optimize them by adjusting generation parameters.

Implications and Future Directions

The introduction of SynthEval could significantly advance how researchers and practitioners in various fields assess and utilize synthetic data. Its flexibility and extensive metric library facilitate a more nuanced understanding of synthetic data's performance, paving the way for broader adoption and trust in synthetic datasets.

There is potential for future work to expand SynthEval's metric library, enhance its usability, and refine its adaptability across more diverse dataset conditions. Continued development and community involvement are crucial to maintaining its relevance and effectiveness in dynamic research and application landscapes.

In summary, SynthEval presents a robust framework for the detailed evaluation of synthetic tabular data, addressing both utility and privacy comprehensively. Its approach sets a new standard in the field, potentially aiding numerous projects and research initiatives that rely on high-quality, privacy-preserving synthetic data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Bhanot, K., Qi, M., Erickson, J.S., Guyon, I., Bennett, K.P.: The problem of fairness in synthetic healthcare data. Entropy 23(9), 1165 (2021) https://doi.org/10.3390/e23091165 Hernandez et al. [2022] Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., Rankin, D.: Synthetic data generation for tabular health records: A systematic review. Neurocomputing 493, 28–45 (2022) https://doi.org/10.1016/J.NEUCOM.2022.04.053 Ping et al. [2017] Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., Rankin, D.: Synthetic data generation for tabular health records: A systematic review. Neurocomputing 493, 28–45 (2022) https://doi.org/10.1016/J.NEUCOM.2022.04.053 Ping et al. [2017] Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  2. Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., Rankin, D.: Synthetic data generation for tabular health records: A systematic review. Neurocomputing 493, 28–45 (2022) https://doi.org/10.1016/J.NEUCOM.2022.04.053 Ping et al. [2017] Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  3. Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  4. Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  5. Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  6. Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  7. Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  8. Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  9. Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  10. Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  11. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  12. Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  13. Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  14. Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  15. Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  16. DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  17. Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  18. Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  19. Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  20. Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  21. Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  22. Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  23. Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  24. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  25. Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  26. Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  27. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  28. Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  29. Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  30. Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  31. European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  32. Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  33. Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  34. Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  35. Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  36. Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  37. Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  38. Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  39. Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  40. Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  41. Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  42. Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  43. Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  44. Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  45. Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  46. Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
  47. Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Anton Danholt Lautrup (2 papers)
  2. Tobias Hyrup (3 papers)
  3. Arthur Zimek (13 papers)
  4. Peter Schneider-Kamp (31 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com