Minimax rates of entropy estimation on large alphabets via best polynomial approximation (1407.0381v3)

Published 1 Jul 2014 in cs.IT, math.IT, math.ST, and stat.TH

Abstract: Consider the problem of estimating the Shannon entropy of a distribution over $k$ elements from $n$ independent samples. We show that the minimax mean-square error is within universal multiplicative constant factors of $$\Big(\frac{k }{n \log k}\Big)² + \frac{\log² k}{n}$$ if $n$ exceeds a constant factor of $\frac{k}{\log k}$; otherwise there exists no consistent estimator. This refines the recent result of Valiant-Valiant \cite{VV11} that the minimal sample size for consistent entropy estimation scales according to $\Theta(\frac{k}{\log k})$. The apparatus of best polynomial approximation plays a key role in both the construction of optimal estimators and, via a duality argument, the minimax lower bound.

Citations (217)

View on Semantic Scholar

Summary

The paper establishes minimax risk bounds for Shannon entropy estimation using a best polynomial approximation methodology.
It rigorously shows that a sufficient sample size, exceeding a constant multiple of k/log k, is critical to achieve consistent estimation.
The results pave the way for designing sample-efficient algorithms in applications such as ecology, linguistics, and neuroscience.

Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation

Overview

This paper addresses the fundamental task of estimating the Shannon entropy of a discrete distribution over a large alphabet based on a finite number of independent samples. The main contribution lies in establishing minimax rates for the mean-square error of entropy estimation, which are portrayed by universal constant factors. The primary result refines previous work by intensely analyzing the minimal sample size required for achieving consistent entropy estimation.

Key Results

The research establishes that the minimax mean-square error for estimating entropy is within constant factors of:

$\left(\frac{k}{n \log k}\right)^2 + \frac{\log^2 k}{n}$

This holds under the condition that $n$ exceeds a certain constant proportional to $\frac{k}{\log k}$ . If not, no consistent estimator can exist.

A sophisticated methodology involving best polynomial approximation plays a crucial role, particularly in building optimal estimators and employing a duality argument to determine the minimax lower bound.
The presented results include a constant-factor characterization of this fundamental limit for the estimation of entropy, matching the theoretical lower and upper bounds in an affine manner.

Implications

Theoretical Significance: The results contribute fundamentally to the decision-theoretic understanding of entropy estimation under constraints, specifically clarifying the non-asymptotic rates of the minimax risk. This insight is essential for deriving efficient algorithms in statistical estimation involving high-dimensional data or distributions with extensive support.
Methodological Contribution: The introduction of polynomial approximation as a tool to handle bias and variance trade-offs in nonparametric statistics is innovative. This approach is particularly advantageous compared to traditional plug-in estimators that become inefficient in the high-dimensional regime.
Practical Applications: This work has profound implications for fields where estimation is regularly conducted over large alphabets from limited samples, such as species diversity estimation in ecology, linguistics (e.g., vocabulary size estimation), and neurological data analysis through spike train coding. The results imply more sample-efficient estimation and potential improvements in resource allocation for data collection and processing.

Future Directions

Refinement and Extension: Beyond entropy, other functional estimations like mutual information or support size might benefit from similar methodological extensions. Identifying analogous polynomial approximation approaches could advance estimations in these arenas.
Algorithm Development: The theoretical bounds and constructions motivate the development of efficient algorithms for large-scale entropy estimation tasks. Future studies could address computational aspects, optimizing for both sample efficiency and processing speed.
Adaptive Methods: Investigations into adaptive approaches that do not require prior knowledge of the alphabet size but leverage sample characteristics dynamically would be valuable. Such techniques could adjust in real-time to different regimes of data availability and alphabet distribution characteristics.

This paper provides a robust foundation for entropy estimation techniques. Its implications extend far into both applied statistics and theoretical information theory, suggesting new lines of inquiry and innovation in statistical and data science practices.

PDF Markdown