Achieving high-performance filtered kNN on large-scale datasets with large query batches

Develop a filtered k-nearest neighbors (kNN) search method that attains high performance on GPUs for large-scale datasets when processing large query batches, overcoming the inefficiencies of existing NVIDIA cuVS brute-force approaches that use bitmap post-processing over all points or CSR-based sparse masked matrix multiplication that incurs scattered memory access patterns.

Background

In the discussion of GPU-based brute-force filtered search, the paper reviews two approaches available in NVIDIA cuVS: a tiled brute-force kNN with bitmap filtering applied as a post-processing step across all data points, and a CSR-based sparse masked matrix multiplication that computes distances only for label-matching pairs. The former is computationally prohibitive for large datasets because it computes distances for non-matching points, while the latter introduces scattered memory access patterns and additional memory overhead to handle the sparse masks.

Given these limitations, the authors explicitly note that, for large-scale datasets and large query batches, existing solutions do not provide a clear path to high-performance filtered kNN on GPUs. They subsequently propose VecFlow's interleaved scan-based IVF-BFS to address this gap, but the statement highlights the broader uncertainty in the state of the art for such workloads.

References

As a result, it is unclear how to achieve high performance filtered kNN on large-scale datasets for large batches.

VecFlow: A High-Performance Vector Data Management System for Filtered-Search on GPUs (2506.00812 - Xi et al., 1 Jun 2025) in Section 4.2, subsubsection "Bottom-Level GPU-Friendly Brute Force Search for LS"