Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The power of private likelihood-ratio tests for goodness-of-fit in frequency tables (2109.09630v2)

Published 20 Sep 2021 in math.ST, stat.ME, stat.ML, and stat.TH

Abstract: Privacy-protecting data analysis investigates statistical methods under privacy constraints. This is a rising challenge in modern statistics, as the achievement of confidentiality guarantees, which typically occurs through suitable perturbations of the data, may determine a loss in the statistical utility of the data. In this paper, we consider privacy-protecting tests for goodness-of-fit in frequency tables, this being arguably the most common form of releasing data, and present a rigorous analysis of the large sample behaviour of a private likelihood-ratio (LR) test. Under the framework of $(\varepsilon,\delta)$-differential privacy for perturbed data, our main contribution is the power analysis of the private LR test, which characterizes the trade-off between confidentiality, measured via the differential privacy parameters $(\varepsilon,\delta)$, and statistical utility, measured via the power of the test. This is obtained through a Bahadur-Rao large deviation expansion for the power of the private LR test, bringing out a critical quantity, as a function of the sample size, the dimension of the table and $(\varepsilon,\delta)$, that determines a loss in the power of the test. Such a result is then applied to characterize the impact of the sample size and the dimension of the table, in connection with the parameters $(\varepsilon,\delta)$, on the loss of the power of the private LR test. In particular, we determine the (sample) cost of $(\varepsilon,\delta)$-differential privacy in the private LR test, namely the additional sample size that is required to recover the power of the Multinomial LR test in the absence of perturbation. Our power analysis rely on a non-standard large deviation analysis for the LR, as well as the development of a novel (sharp) large deviation principle for sum of i.i.d. random vectors, which is of independent interest.

Citations (1)

Summary

We haven't generated a summary for this paper yet.