Cause of GloVe-25 sensitivity to the CrackIVF min_pts parameter
Determine the underlying reason that increasing the CrackIVF heuristic parameter min_pts from 2 to 32 improves the Queries-Per-Second vs. Recall trade-off on the GloVe-25 dataset. Ascertain whether the improvement is primarily due to the 25-dimensional embeddings causing slower growth of the build-operations budget and thus excessive local imbalances at min_pts=2, or whether other factors—such as convergence to a smaller final number of partitions or the dataset’s query distribution—are responsible.
References
Although we can not make a definite statement, we hypothesize that this can be attributed to the fact that this dataset only has 25-dimensional embeddings.
                — Cracking Vector Search Indexes
                
                (2503.01823 - Mageirakos et al., 3 Mar 2025) in Section 5 (Experiments), Control Mechanisms Ablation Study — Varying heuristic rule parameters