On the Usefulness of Synthetic Tabular Data Generation (2306.15636v1)

Published 27 Jun 2023 in cs.LG

Abstract: Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting ML training. Privacy-preserving synthetic data generation can accelerate data exchange for downstream tasks, but there is not enough evidence to show how or why synthetic data can boost ML training. In this study, we benchmarked ML performance using synthetic tabular data for four use cases: data sharing, data augmentation, class balancing, and data summarization. We observed marginal improvements for the balancing use case on some datasets. However, we conclude that there is not enough evidence to claim that synthetic tabular data is useful for ML training.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (25)

Authors (2)

Dionysis Manousakas (4 papers)
Sergül Aydöre (1 paper)

Citations (6)

View on Semantic Scholar

On the Usefulness of Synthetic Tabular Data Generation (2306.15636v1)

Related Papers