Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 38 tok/s Pro
GPT-5 Medium 19 tok/s
GPT-5 High 23 tok/s Pro
GPT-4o 87 tok/s
GPT OSS 120B 464 tok/s Pro
Kimi K2 171 tok/s Pro
2000 character limit reached

Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation (2507.17066v1)

Published 22 Jul 2025 in cs.LG and stat.ML

Abstract: Synthetic tabular data is essential for machine learning workflows, especially for expanding small or imbalanced datasets and enabling privacy-preserving data sharing. However, state-of-the-art generative models (GANs, VAEs, diffusion models) rely on large datasets with thousands of examples. In low-data settings, often the primary motivation for synthetic data, these models can overfit, leak sensitive records, and require frequent retraining. Recent work uses large pre-trained transformers to generate rows via in-context learning (ICL), which needs only a few seed examples and no parameter updates, avoiding retraining. But ICL repeats seed rows verbatim, introducing a new privacy risk that has only been studied in text. The severity of this risk in tabular synthesis-where a single row may identify a person-remains unclear. We address this gap with the first benchmark of three foundation models (GPT-4o-mini, LLaMA 3.3 70B, TabPFN v2) against four baselines on 35 real-world tables from health, finance, and policy. We evaluate statistical fidelity, downstream utility, and membership inference leakage. Results show foundation models consistently have the highest privacy risk. LLaMA 3.3 70B reaches up to 54 percentage points higher true-positive rate at 1% FPR than the safest baseline. GPT-4o-mini and TabPFN are also highly vulnerable. We plot the privacy-utility frontier and show that CTGAN and GPT-4o-mini offer better tradeoffs. A factorial study finds that three zero-cost prompt tweaks-small batch size, low temperature, and using summary statistics-can reduce worst-case AUC by 14 points and rare-class leakage by up to 39 points while maintaining over 90% fidelity. Our benchmark offers a practical guide for safer low-data synthesis with foundation models.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com