Dice Question Streamline Icon: https://streamlinehq.com

Derive k-dependent performance equations for top-k E2LSH/E2LSHoS

Derive explicit analytical equations characterizing how both the in-memory E2LSH query time and the number of hash-bucket reads (I/O operations) in E2LSH-on-Storage depend on the top-k parameter k for Euclidean c-approximate top-k nearest neighbor search with E2LSH parameters (m, L, S), to enable precise prediction of storage IOPS requirements for top-k queries.

Information Square Streamline Icon: https://streamlinehq.com

Background

In analyzing storage requirements to match in-memory E2LSH speed, the authors model query time via computational and I/O components. They note that both in-memory E2LSH time and E2LSHoS I/O counts grow sublinearly with database size n, but they explicitly state the absence of equations for the dependence on k in top-k ANNS. Despite empirical observations, this analytic gap limits formal prediction of IOPS needs for larger k.

Filling this gap would allow the model to determine how increasing k impacts both computation and I/O, strengthening the generality of the analysis framework used throughout the paper.

References

we know they both grow sublinearly in n, and while we do not have equations for k, no substantial change in the IOPS requirements is observed for larger k as shown in Figure 1.

Implementing and Evaluating E2LSH on Storage (2403.16404 - Nakanishi et al., 25 Mar 2024) in Section 4.6 (Requirements for In-memory Speeds)