Tight Bounds for Monotone Minimal Perfect Hashing (2207.10556v2)
Abstract: The monotone minimal perfect hash function (MMPHF) problem is the following indexing problem. Given a set $S= {s_1,\ldots,s_n}$ of $n$ distinct keys from a universe $U$ of size $u$, create a data structure $DS$ that answers the following query: [ RankOp(q) = \text{rank of } q \text{ in } S \text{ for all } q\in S ~\text{ and arbitrary answer otherwise.} ] Solutions to the MMPHF problem are in widespread use in both theory and practice. The best upper bound known for the problem encodes $DS$ in $O(n\log\log\log u)$ bits and performs queries in $O(\log u)$ time. It has been an open problem to either improve the space upper bound or to show that this somewhat odd looking bound is tight. In this paper, we show the latter: specifically that any data structure (deterministic or randomized) for monotone minimal perfect hashing of any collection of $n$ elements from a universe of size $u$ requires $\Omega(n \cdot \log\log\log{u})$ expected bits to answer every query correctly. We achieve our lower bound by defining a graph $\mathbf{G}$ where the nodes are the possible ${u \choose n}$ inputs and where two nodes are adjacent if they cannot share the same $DS$. The size of $DS$ is then lower bounded by the log of the chromatic number of $\mathbf{G}$. Finally, we show that the fractional chromatic number (and hence the chromatic number) of $\mathbf{G}$ is lower bounded by $2{\Omega(n \log\log\log u)}$.