Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Competitive Distribution Estimation (1503.07940v1)

Published 27 Mar 2015 in cs.IT, cs.DS, cs.LG, math.IT, math.ST, and stat.TH

Abstract: Estimating an unknown distribution from its samples is a fundamental problem in statistics. The common, min-max, formulation of this goal considers the performance of the best estimator over all distributions in a class. It shows that with $n$ samples, distributions over $k$ symbols can be learned to a KL divergence that decreases to zero with the sample size $n$, but grows unboundedly with the alphabet size $k$. Min-max performance can be viewed as regret relative to an oracle that knows the underlying distribution. We consider two natural and modest limits on the oracle's power. One where it knows the underlying distribution only up to symbol permutations, and the other where it knows the exact distribution but is restricted to use natural estimators that assign the same probability to symbols that appeared equally many times in the sample. We show that in both cases the competitive regret reduces to $\min(k/n,\tilde{\mathcal{O}}(1/\sqrt n))$, a quantity upper bounded uniformly for every alphabet size. This shows that distributions can be estimated nearly as well as when they are essentially known in advance, and nearly as well as when they are completely known in advance but need to be estimated via a natural estimator. We also provide an estimator that runs in linear time and incurs competitive regret of $\tilde{\mathcal{O}}(\min(k/n,1/\sqrt n))$, and show that for natural estimators this competitive regret is inevitable. We also demonstrate the effectiveness of competitive estimators using simulations.

Summary

We haven't generated a summary for this paper yet.