2000 character limit reached
On the Statistical Complexity of Sample Amplification (2201.04315v2)
Published 12 Jan 2022 in math.ST, cs.DS, cs.IT, cs.LG, math.IT, and stat.TH
Abstract: The ``sample amplification'' problem formalizes the following question: Given $n$ i.i.d. samples drawn from an unknown distribution $P$, when is it possible to produce a larger set of $n+m$ samples which cannot be distinguished from $n+m$ i.i.d. samples drawn from $P$? In this work, we provide a firm statistical foundation for this problem by deriving generally applicable amplification procedures, lower bound techniques and connections to existing statistical notions. Our techniques apply to a large class of distributions including the exponential family, and establish a rigorous connection between sample amplification and distribution learning.