Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 27 tok/s
GPT-5 High 22 tok/s Pro
GPT-4o 89 tok/s
GPT OSS 120B 457 tok/s Pro
Kimi K2 169 tok/s Pro
2000 character limit reached

A Conditional GAN for Tabular Data Generation with Probabilistic Sampling of Latent Subspaces (2508.00472v1)

Published 1 Aug 2025 in cs.LG

Abstract: The tabular form constitutes the standard way of representing data in relational database systems and spreadsheets. But, similarly to other forms, tabular data suffers from class imbalance, a problem that causes serious performance degradation in a wide variety of machine learning tasks. One of the most effective solutions dictates the usage of Generative Adversarial Networks (GANs) in order to synthesize artificial data instances for the under-represented classes. Despite their good performance, none of the proposed GAN models takes into account the vector subspaces of the input samples in the real data space, leading to data generation in arbitrary locations. Moreover, the class labels are treated in the same manner as the other categorical variables during training, so conditional sampling by class is rendered less effective. To overcome these problems, this study presents ctdGAN, a conditional GAN for alleviating class imbalance in tabular datasets. Initially, ctdGAN executes a space partitioning step to assign cluster labels to the input samples. Subsequently, it utilizes these labels to synthesize samples via a novel probabilistic sampling strategy and a new loss function that penalizes both cluster and class mis-predictions. In this way, ctdGAN is trained to generate samples in subspaces that resemble those of the original data distribution. We also introduce several other improvements, including a simple, yet effective cluster-wise scaling technique that captures multiple feature modes without affecting data dimensionality. The exhaustive evaluation of ctdGAN with 14 imbalanced datasets demonstrated its superiority in generating high fidelity samples and improving classification accuracy.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube