Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Post-processing Private Synthetic Data for Improving Utility on Selected Measures (2305.15538v2)

Published 24 May 2023 in cs.LG, cs.CR, cs.DB, cs.IT, and math.IT

Abstract: Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end users may have specific requirements that the synthetic data must satisfy. Failure to meet these requirements could significantly reduce the utility of the data for downstream use. We introduce a post-processing technique that improves the utility of the synthetic data with respect to measures selected by the end user, while preserving strong privacy guarantees and dataset quality. Our technique involves resampling from the synthetic data to filter out samples that do not meet the selected utility measures, using an efficient stochastic first-order algorithm to find optimal resampling weights. Through comprehensive numerical experiments, we demonstrate that our approach consistently improves the utility of synthetic data across multiple benchmark datasets and state-of-the-art synthetic data generation algorithms.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hao Wang (1120 papers)
  2. Shivchander Sudalairaj (9 papers)
  3. John Henning (10 papers)
  4. Kristjan Greenewald (65 papers)
  5. Akash Srivastava (50 papers)
Citations (5)