Minimax Estimation of Functionals of Discrete Distributions (1406.6956v5)

Published 26 Jun 2014 in cs.IT, math.IT, math.ST, and stat.TH

Abstract: We propose a general methodology for the construction and analysis of minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the alphabet size $S$ is unknown and may be comparable with the number of observations $n$. We treat the respective regions where the functional is "nonsmooth" and "smooth" separately. In the "nonsmooth" regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the "smooth" regime, we apply a bias-corrected Maximum Likelihood Estimator (MLE). We illustrate the merit of this approach by thoroughly analyzing two important cases: the entropy $H(P) = \sum_{i = 1}^S -p_i \ln p_i$ and $F_\alpha(P) = \sum_{i = 1}^S p_i^{\alpha,\alpha>0$.} We obtain the minimax $L_2$ rates for estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity $n \asymp S/\ln S$ for entropy estimation. We also show that the sample complexity for estimating $F_\alpha(P),0<\alpha<1$ is $n\asymp S^{1/\alpha}/ \ln S$, which can be achieved by our estimator but not the MLE. For $1<\alpha<3/2$, we show the minimax $L_2$ rate for estimating $F_\alpha(P)$ is $(n\ln n)^{{-2(\alpha-1)}$} regardless of the alphabet size, while the $L_2$ rate for the MLE is $n^{{-2(\alpha-1)}$.} For all the above cases, the behavior of the minimax rate-optimal estimators with $n$ samples is essentially that of the MLE with $n\ln n$ samples. We highlight the practical advantages of our schemes for entropy and mutual information estimation. We demonstrate that our approach reduces running time and boosts the accuracy compared to existing various approaches. Moreover, we show that the mutual information estimator induced by our methodology leads to significant performance boosts over the Chow--Liu algorithm in learning graphical models.

Citations (242)

View on Semantic Scholar

Summary

The paper introduces a minimax framework that separates analysis into nonsmooth and smooth regimes using polynomial approximations and bias-corrected MLE.
It establishes minimax L2 risk bounds, showing that optimal entropy estimation needs roughly n ~ S/ln S samples, outperforming classical methods.
The proposed estimator achieves effective sample size enlargement, reducing bias and improving accuracy compared to traditional plug-in estimators.

Minimax Estimation of Functionals of Discrete Distributions: An Academic Overview

The research paper focuses on the development of a methodology for constructing minimax estimators for a variety of functionals over discrete distributions, emphasizing entropy and related information measures. The authors address the problem of estimating functionals of unknown finite-dimensional parameters from discrete distributions, particularly when the support size, denoted as $S$ , may be comparable to or exceed the number of observations, $n$ . This problem becomes particularly complex in the "nonsmooth" regime, where the variance of traditional estimators often results in suboptimal performance.

Methodology and Contributions

The authors introduce a framework that separates the problem into "nonsmooth" and "smooth" regimes for analysis. In the "nonsmooth" regime, they propose the use of unbiased estimators based on polynomial approximations of the functional. Conversely, the "smooth" regime leverages a bias-corrected Maximum Likelihood Estimator (MLE) to improve estimation accuracy.

Significant theoretical advancements are demonstrated through minimax $L_2$ risk bounds for estimating entropy $H(P) = \sum_{i = 1}^S -p_i \ln p_i$ and the functional $F_\alpha(P) = \sum_{i = 1}^S p_i^\alpha$ , where $\alpha > 0$ . For instance, the paper demonstrates that the optimal sample complexity needed for consistent entropy estimation is $n \asymp S/\ln S$ , an improvement over classical approaches that require $n \gtrsim S$ .

Numerical Results and Comparisons

The paper provides comprehensive analysis of estimation schemes by comparing their proposed methodology with existing approaches on multiple evaluative fronts. A key finding is that their estimator, through a novel concept termed "effective sample size enlargement," equates the performance of the MLE with fewer samples ( $n \ln n$ as opposed to $n$ ) while maintaining lower bias.

Implications and Future Directions

The authors reveal that their minimax rate-optimal estimator outperforms conventional plug-in estimators (such as MLE) across various domains by reducing bias and maintaining manageable variance. This work potentially revolutionizes applications relying on information measure estimation, from mutual information to graphical model learning, by offering markedly improved accuracy and computational efficiency.

In conclusion, the research proposes a significant leap forward in discrete distribution estimation, opening avenues for future exploration of similar methods in other statistical and probabilistic domains, especially within the sphere of high-dimensional statistics and big data analytics. The findings may also inspire further investigations into the fundamental limits and efficiency of functional estimation in complex systems.