Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Privacy of synthetic data: a statistical framework (2109.01748v1)

Published 3 Sep 2021 in cs.CR, math.ST, and stat.TH

Abstract: Privacy-preserving data analysis is emerging as a challenging problem with far-reaching impact. In particular, synthetic data are a promising concept toward solving the aporetic conflict between data privacy and data sharing. Yet, it is known that accurately generating private, synthetic data of certain kinds is NP-hard. We develop a statistical framework for differentially private synthetic data, which enables us to circumvent the computational hardness of the problem. We consider the true data as a random sample drawn from a population Omega according to some unknown density. We then replace Omega by a much smaller random subset Omega*, which we sample according to some known density. We generate synthetic data on the reduced space Omega* by fitting the specified linear statistics obtained from the true data. To ensure privacy we use the common Laplacian mechanism. Employing the concept of Renyi condition number, which measures how well the sampling distribution is correlated with the population distribution, we derive explicit bounds on the privacy and accuracy provided by the proposed method.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. March Boedihardjo (15 papers)
  2. Thomas Strohmer (45 papers)
  3. Roman Vershynin (61 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.