Low-Selectivity Filtered ANNS Techniques

Updated 26 March 2026

Low-selectivity filtered ANNS is defined for scenarios where only a tiny fraction (≤1%) of data satisfies structured filters, necessitating specialized indexing techniques.
The approach addresses challenges inherent in traditional methods, such as graph disconnection and candidate explosion, by integrating filter-aware strategies.
Recent advancements like Falconn++, Curator, and JAG demonstrate robust performance gains and improved recall through hybrid and partition-based indexing solutions.

Low-selectivity filtered approximate nearest neighbor search (ANNS) concerns the retrieval of nearest vector neighbors jointly subject to similarity and structured attribute predicates, in regimes where only a small fraction of the data (typically s ≤ 1%) passes the filter. This setting is prevalent in modern embedding-based search, recommendation, and vector database systems that must honor metadata, label, or range constraints alongside high-dimensional vector proximity. The low-selectivity regime fundamentally alters the performance characteristics and index design space, requiring dedicated techniques to prevent either coverage breakdown (as in graph-based methods) or prohibitive latency/candidate explosion (as in naive post-filtering approaches). Advancements in this domain include both algorithmic contributions (e.g., filter-integrated indexes, partitioning strategies) and empirical system-level findings on robust execution planning.

1. Formal Problem Definition and Selectivity Metrics

Let $S\subset \mathbb{R}^d$ be a dataset of $n$ vectors, each annotated with metadata $A_x$ (categorical, numerical, or multi-label attributes). For a filter predicate $\sigma$ , let $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ be the filtered target subset, with selectivity $s=|P_{(\sigma)}|/n$ . A filtered-ANNS query $(q, \sigma, k)$ seeks the $k$ nearest neighbors to query $q$ within $P_{(\sigma)}$ , according to the desired similarity metric (e.g., Euclidean, angular).

Filter selectivity $n$ 0 formalizes the core difficulty: as $n$ 1, the fraction of qualifying vectors approaches zero. In vector database and IR literature, $n$ 2 defines the "low-selectivity" regime, requiring algorithms to efficiently retrieve results from highly sparse subsets without full scans (Jin et al., 3 Jan 2026, Li et al., 22 Aug 2025, Amanbayev et al., 11 Feb 2026, Shi et al., 9 Sep 2025).

Candidate set size and recall metrics follow from standard ANNS definitions but must be understood relative to $n$ 3. Expected candidate set size after pure pre-filter is $n$ 4. Global-Local Selectivity (GLS) correlation, introduced as

$n$ 5

with global selectivity $n$ 6 and local selectivity $n$ 7 for query $n$ 8, further quantifies the spatial alignment (or depletion) of filter predicate with the geometric neighborhood (Amanbayev et al., 11 Feb 2026).

2. Challenges of Low-Selectivity Filtering in ANNS

Low selectivity poses distinct theoretical and practical challenges:

Graph connectivity breakdown: In proximity-graph indices (HNSW, Vamana, etc.), inducing the subgraph on $n$ 9 typically produces disconnected or sparsely connected components when $A_x$ 0. Maintaining adequate connectivity would necessitate increasing average degree to $A_x$ 1, incurring quadratic or worse resource explosion (Jin et al., 3 Jan 2026, Amanbayev et al., 11 Feb 2026, Li et al., 22 Aug 2025).
Candidate inefficiency in filtering sequences: In "post-filter" methods, ANNS is run unaware of the filter, necessitating large candidate pools to ensure at least $A_x$ 2 valid results among $A_x$ 3, leading to $A_x$ 4 and substantial slowdown (Shi et al., 9 Sep 2025).
No universal index structure: Standard tree-, graph-, or hashing-based indexes exhibit differing points of failure under low selectivity: trees degenerate to $A_x$ 5 leaf scans, graphs disconnect, and LSH or partition indexes see candidate set or probe growth exponential in $A_x$ 6 (Li et al., 22 Aug 2025).
Dynamic, complex, or ad hoc predicates: Supporting arbitrary filters (multi-attribute, Boolean, or dynamically composed) exacerbates the problem, as most approaches are specialized for fixed-types or cannot pre-materialize indexes for all possible subpopulations (Jin et al., 3 Jan 2026, Xu et al., 10 Feb 2026).

3. Algorithmic Paradigms and Methodology Landscape

As systematized in (Shi et al., 9 Sep 2025), filtered ANNS strategies fall into three broad categories, with distinct behaviors under low selectivity:

Paradigm	Typical Indexing	Typical Query Behavior at Low $A_x$ 7
Filter-then-Search	Per-label listing; bitset/inverted index	Candidate set shrinks with $A_x$ 8; usually efficient; robust for containment/equality filters; can preclude graph methods
Search-then-Filter	Graph (HNSW), IVF	Required candidate set grows as $A_x$ 9; recall and QPS degrade sharply for small $\sigma$ 0; major bottleneck for general filters
Hybrid-Search	Filter-aware graph or cluster	Built-in label/range constraints during index build/neighbor expansion; robust but often needs per-filter or multi-subgraph maintenance

Specialized techniques include:

Pre-filter brute force for very small candidate pool ( $\sigma$ 1): exact search is often optimal (Shi et al., 9 Sep 2025, Amanbayev et al., 11 Feb 2026).
Attribute-aware entry points in graphs improve the probability of reachability for $\sigma$ 2 at low $\sigma$ 3 (Li et al., 22 Aug 2025).
Partition- and tree-based dual indexes (e.g., Curator) build a shared base index with lightweight per-filter overlays to provide near-flat scaling with respect to $\sigma$ 4 (Jin et al., 3 Jan 2026).
Hybrid edge-pruning strategies (Filtered-DiskANN, ACORN, etc.) partition or jointly prune neighbors to preserve navigability within valid regions (Li et al., 22 Aug 2025, Shi et al., 9 Sep 2025).
Continuous filter/attribute distances and unified index construction (as in JAG) substitute hard binary filtering with continuous navigational gradients supporting robust connectivity for all predicate types (Xu et al., 10 Feb 2026).

4. Advances in Filter-Aware Indexing and Algorithms

4.1 Locality-Sensitive Filtering (Falconn++)

Falconn++ introduces a locality-sensitive filter for hashing-based ANNS. For cross-polytope LSH hash tables indexed by $\sigma$ 5, the scheme filters bucket contents by only retaining vectors $\sigma$ 6 with $\sigma$ 7, thresholded to retain only the top $\sigma$ 8-fraction (low selectivity) (Pham et al., 2022). Provably, for close/far point collisions, the probability of passing the filter for near neighbors is lower bounded, while far points are aggressively eliminated. The exponent $\sigma$ 9 governing query time satisfies $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 0 (original Falconn), improving asymptotic performance. Empirically, speedups of $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 1 to $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 2 over Falconn, and superior multi-threaded scaling relative to HNSW, are reported at high recall ( $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 3).

4.2 Partition-Based Dual Indexing (Curator)

Curator applies hierarchical $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 4-means to build a shared clustering tree, embedding per-label and per-predicate subindexes as buffers and Bloom filters. For a label or predicate of low selectivity $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 5, search is restricted to leaves indicated by Bloom filter presence, yielding query time $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 6 for some $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 7. The inclusion of on-the-fly construction for complex predicates (virtual labels) minimizes the overhead of arbitrary filter evaluation. Empirical measurements demonstrate up to $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 8 query-latency reduction at $P_{(\sigma)}=\{x\in S:\ x\text{ satisfies }\sigma\}$ 9, with build and memory overheads below $s=|P_{(\sigma)}|/n$ 0 (Jin et al., 3 Jan 2026).

4.3 Filter-Agnostic Graph Methods (JAG)

JAG defines joint "filter" and "attribute" distances, leveraging lexicographic ordering and multi-threshold edge construction to ensure robust connectivity across all selectivity spectra and filter types (label, range, subset, Boolean). Instead of hard pre-filtering, the query traverses the graph greedily according to a continuous $s=|P_{(\sigma)}|/n$ 1, ensuring search does not terminate prematurely even if valid points are sparse. Experimental results show that baselines stagnate at recall $s=|P_{(\sigma)}|/n$ 2 in the extreme low-selectivity regime ( $s=|P_{(\sigma)}|/n$ 3), whereas JAG attains perfect recall at QPS values an order of magnitude higher ( $s=|P_{(\sigma)}|/n$ 4 at recall $s=|P_{(\sigma)}|/n$ 5) (Xu et al., 10 Feb 2026).

4.4 Hybrid Graph–Partition Approaches

Empirical benchmarking in FAISS, Milvus, and pgvector supports the partitioned index approach (e.g., IVFFlat), showing that for $s=|P_{(\sigma)}|/n$ 6, IVFFlat QPS exceeds that of HNSW by a factor of $s=|P_{(\sigma)}|/n$ 7, while recall is $s=|P_{(\sigma)}|/n$ 8 versus $s=|P_{(\sigma)}|/n$ 9 at $(q, \sigma, k)$ 0, $(q, \sigma, k)$ 1 (Amanbayev et al., 11 Feb 2026). Systems with hybrid execution, e.g., Milvus’s adaptive brute-force fallback, achieve robust recall at the expense of a "latency floor." Cost-based optimizers in systems like pgvector may underperform unless filter attributes are indexed and plan selection is forced manually.

5. Empirical Trends and Best Practices for Low-Selectivity Queries

Across extensive benchmarks (Li et al., 22 Aug 2025, Shi et al., 9 Sep 2025, Amanbayev et al., 11 Feb 2026), several patterns hold in the $(q, \sigma, k)$ 2 regime:

Pre-filtering followed by brute-force search is optimal when $(q, \sigma, k)$ 3 is very small ( $(q, \sigma, k)$ 4).
Post-filter methods (HNSW, IVF-PQ, etc.) see recall and QPS degrade $(q, \sigma, k)$ 5 or worse as $(q, \sigma, k)$ 6, with recall ceilings often below $(q, \sigma, k)$ 7.
Graph-based hybrid or filter-aware methods maintain higher recall, but only if edge pruning and entry-point selection are cognizant of attribute structure.
Dual-index, partition, and tree overlays (e.g., Curator, IVFFlat with cluster filtering) provide near-flat or sublinear query-scaling with $(q, \sigma, k)$ 8, offering orders of magnitude better throughput.
JAG and threshold-based graph methods avoid navigational dead-ends for all predicates, combining filter-robustness with vector similarity, thus yielding high recall across $(q, \sigma, k)$ 9.
Parameter tuning: Partition count, cluster size, beam width, and probe parameters must be (re-)tuned to target recall at low $k$ 0; optimal values shift when $k$ 1 grows or attribute–vector correlations change.

6. Practical Recommendations and System Integration

Key operational guidelines emerging from recent research:

For $k$ 2, prefer partition-based indexes (IVFFlat, Curator) or per-filter overlays if predicates are known or can be cached (Jin et al., 3 Jan 2026, Amanbayev et al., 11 Feb 2026).
Use filter-aware graph algorithms (e.g., JAG, Filtered-DiskANN, ACORN) only if connections preserving $k$ 3-hop reachability in $k$ 4 can be maintained at manageable cost.
For mixed or unknown selectivity, maintain hybrid indices, switch execution paths by estimating $k$ 5 at query time, or use adaptive fallback to exact scan below a threshold (Amanbayev et al., 11 Feb 2026).
Pre-index filter attributes (e.g., B-trees in SQL systems) to exploit filter-first planning and avoid spurious recall/latency trade-offs (Amanbayev et al., 11 Feb 2026).
Always monitor global-local selectivity correlation $k$ 6 for challenging queries and dynamically tune execution plans or search parameters when $k$ 7 (Amanbayev et al., 11 Feb 2026).
Evaluate and tune edge-pruning parameters, partition counts, and buffer capacities specifically for the low-selectivity workload profile (Li et al., 22 Aug 2025, Shi et al., 9 Sep 2025).

7. Limitations and Open Research Directions

Despite recent advances, several problems remain:

Theorizing optimal query-time/space tradeoffs for arbitrary (dynamic, ad hoc) predicates and providing closed-form cost models $k$ 8 for practical index classes remains unresolved (Li et al., 22 Aug 2025).
No index achieves worst-case sublinear performance with full coverage for arbitrary filters as $k$ 9; most techniques incur switches to brute-force within the rarest subpopulations.
Auto-tuning of all key hyperparameters (e.g., cluster/partition count, graph degree, probe width) under time-varying and workload-driven $q$ 0 is unsolved at scale (Li et al., 22 Aug 2025).
Ensuring robustness against adversarial or highly skewed attribute–vector correlations (GLS $q$ 1) is an open area, with only preliminary adaptivity in current systems (Amanbayev et al., 11 Feb 2026).
Efficient support for complex, user-defined predicates (including composite Boolean logic and continuous attribute mixtures) without exponential subindex proliferation calls for data structures that combine partitioning, filter-aware graph augmentation, and perhaps continuous filter-distance models as in JAG (Xu et al., 10 Feb 2026).

Continued rigorous benchmarking (e.g., unified benchmarks (Shi et al., 9 Sep 2025), extensible system testbeds (Amanbayev et al., 11 Feb 2026)) and public algorithm implementations are facilitating comparative progress and hybrid deployments across the research and engineering spectrum.