2000 character limit reached
On Missing Mass Variance (2104.07028v1)
Published 14 Apr 2021 in cs.IT, math.IT, math.ST, and stat.TH
Abstract: The missing mass refers to the probability of elements not observed in a sample, and since the work of Good and Turing during WWII, has been studied extensively in many areas including ecology, linguistic, networks and information theory. This work determines what is the \emph{maximal variance of the missing mass}, for any sample and alphabet sizes. The result helps in understanding the missing mass concentration properties.