Space lower bound for heavy hitters when only the list (no estimates) is output

Determine whether computing the deterministic heavy hitters problem \(\msf{HeavyHitters}[n, U, k]\) in the streaming model—when the algorithm is required to output only a list of \(k\) elements containing all items with frequency at least \(n/k\) but not the frequency estimates—requires \(\Omega(k\log(n/k))\) bits of space in the worst case for integers \(n, U, k\) with \(\min\{n, U\} \gg k \ge 2\).

Background

The paper proves optimal space lower bounds for the standard heavy hitters problem when both the list of k candidates and their frequency estimates are required (Theorem 7.1), leveraging new lower bounds for approximate counting. However, the proof technique critically uses the frequency estimates component.

The authors explicitly state that the case where only the list of k elements is returned—without any estimates—remains unresolved and pose it as an open problem, asking whether the Ω(klog(n/k))\Omega(k\log(n/k)) lower bound still holds in this setting.

References

We remark that the proof above depends on the estimates $\wtilde{f_1},\cs,\wtilde{f_k}$. If we are only required to output a list ${u_1,\cs,u_k}$, we do not know whether we can prove the streaming lower bound. We leave this as an open problem. Prove or disprove: for any integers $n,U,k$ such that $\min{n,U}\gg k>=2$, computing $\msf{HeavyHitters}[n,U,k]$ (without outputting $\wtilde{f_1},\cs,\wtilde{f_k}$) requires $\Omegak\log(n/k)$ bits of space in the streaming model.

Tight Streaming Lower Bounds for Deterministic Approximate Counting  (2406.12149 - Wang, 2024) in Open Problem open.heavyhitters, Section 7.1 (Lower Bound for Heavy Hitters)