FusedANN: Hybrid Filtered ANNS Methods
- FusedANN is a hybrid approach that integrates filtering constraints into ANN search, effectively addressing low-selectivity challenges in high-dimensional spaces.
- It combines filter-then-search, search-then-filter, and hybrid indexing paradigms to enhance recall and query speed under stringent filtering conditions.
- The method leverages state-of-the-art index structures and dynamic parameter tuning to optimize performance in semantic retrieval and vector database applications.
Low-selectivity filtered Approximate Nearest Neighbor Search (ANNS) refers to the task of finding the top- nearest vectors, by some distance metric, to a query vector, subject to additional filtering constraints (such as labels, ranges, or complex predicates), in the regime where the filter selectivity —the fraction of database points passing the predicate—is very small, typically or even . This regime poses unique algorithmic and systems challenges due to the sparsity and fragmentation of qualifying candidates in high-dimensional spaces. The development of effective and efficient methods for low-selectivity filtered ANNS is of central importance for semantic retrieval, retrieval-augmented generation (RAG), and vector database systems with structured or metadata-based constraints.
1. Formalization and Selectivity Metrics
Let be a dataset of vectors in , each accompanied by structured metadata (labels, attributes, timestamps, etc.). A filter predicate defines a subset , with selectivity (Jin et al., 3 Jan 2026, Shi et al., 9 Sep 2025, Amanbayev et al., 11 Feb 2026). A low-selectivity filtered-ANNS query thus seeks:
- Top- nearest neighbors to a query vector , restricted to .
Selectivity notation:
- Global selectivity
- Local selectivity for query , , with the true -nearest neighbors of in
- GLS correlation per query: (Amanbayev et al., 11 Feb 2026)
In the low-selectivity regime (), the expected candidate set after pure pre-filtering is , and for search/filter hybrids, typically raw candidates are retrieved with (Shi et al., 9 Sep 2025).
2. Algorithmic Paradigms and Structures
A comprehensive taxonomy of filtered ANNS algorithms distinguishes three main paradigms according to the interplay of index construction and filtering (Shi et al., 9 Sep 2025, Li et al., 22 Aug 2025):
- Filter-Then-Search: Explicitly selects the subset before running ANNS. Example: UNG (Unified Navigating Graph) materializes a pre-filtered candidate set then applies graph search; ACORN with pre-filtering (Shi et al., 9 Sep 2025).
- Search-Then-Filter: Runs standard ANNS on the whole dataset, then filters results post-hoc. Example: HNSW, IVFPQ with post-filtering (Shi et al., 9 Sep 2025). Performance degrades significantly as due to insufficient valid candidates, requiring .
- Hybrid/Integrated Filtering: Integrates filter awareness into indexing and search. Examples:
- Stitched and Filtered-DiskANN: Build Vamana graphs with label-limited edges or stitched subgraphs.
- JAG (Joint Attribute Graphs): Builds proximity graphs with continuous filter/attribute distances for guidance, robust to arbitrary and diverse predicates (Xu et al., 10 Feb 2026).
- Curator: Hierarchical partition-based index with embedded per-label/predicate buffers and Bloom filters for precise and efficient filtering at low selectivity (Jin et al., 3 Jan 2026).
These strategies contrast in their cost scaling, robustness, and recall under diminishing .
3. Breakdown of Classical Structures under Low Selectivity
Graph-Based Indexes
Graph indexes (HNSW, DiskANN, ACORN) rely on high local connectivity. As decreases, the induced subgraph on fragments, leading to many qualifying vectors being unreachable by traversal (Jin et al., 3 Jan 2026, Amanbayev et al., 11 Feb 2026, Li et al., 22 Aug 2025). Remedying this by increasing average degree becomes infeasible ( construction and memory), and even specialized segmentation approaches (e.g., edge covering, multi-entry points) fail below (Li et al., 22 Aug 2025).
Partitioned and Inverted File Structures
Partition-based (IVFFlat, IVFPQ) indexes directly support pre-filtering: cluster selection or inverted lists can be efficiently intersected with filter results. Empirically, IVFPQ/IVFFlat maintain robust query latency and recall at where graph-based methods collapse (Amanbayev et al., 11 Feb 2026, Li et al., 22 Aug 2025).
Hashing/LSH-Based Indexes
Hashing approaches such as Falconn++ implement low-selectivity by aggressive bucket filtering. Falconn++ applies a projection-based filter per bucket, keeping only an -fraction of points, substantially reducing candidate pool size, with provable reduction in query time exponent (Pham et al., 2022). This enables scaling to much lower selectivity than classical LSH.
Tree-Based Partitioning
Curator constructs a global hierarchical -means tree indexing all data, embedding per-label/predicate subindexes via buffers/Bloom filters to support low-selectivity filtering with minimal memory and update overhead (Jin et al., 3 Jan 2026). Tree expansions are sharply bounded in practice, and construction cost is .
4. Filter Types, Cost Models, and Empirical Benchmarks
Supported Filter Types
Advanced methods address various filter types:
- Equality (label/attribute)
- Range (numerical, e.g., date intervals)
- Subset/Containment (multi-label, tag inclusion)
- Boolean/Complex predicates
JAG is notable for transforming each binary filter into a continuous filter distance , enabling lexicographically guided search and connectivity smoothing across low-selectivity regimes (Xu et al., 10 Feb 2026).
Cost and Recall Scaling
| Method | Query Time Scaling at Low | Recall Performance | Notes |
|---|---|---|---|
| HNSW post-filter | Recall@10 collapses for | Graph disconnects | |
| IVFPQ/IVFFlat | Recall remains stable at | Partition pruning | |
| Curator | , | QPS baseline at | Hier. partition+buffers |
| JAG | hops via multi-threshold edges | Recall at | Filter-agnostic |
| Falconn++ | , | Empirically faster than Falconn | LSH/filtered-bucket |
Empirical studies confirm that:
- Hybrid and partition/tree-based approaches maintain high recall/QPS as , while graph-based methods degrade sharply (Li et al., 22 Aug 2025, Jin et al., 3 Jan 2026, Amanbayev et al., 11 Feb 2026, Shi et al., 9 Sep 2025).
- IVFFlat outperforms HNSW for , QPS advantage reaches at (Amanbayev et al., 11 Feb 2026).
5. State-of-the-Art Methods: Constructions and Innovations
Curator
Curator's dual-index combines a global tree (hierarchical -means), per-label/predicate leaf buffers, and Bloom filters at each node. Queries traverse only nodes likely to contain qualifying candidates. For complex predicates, it constructs a temporary subindex mirroring the global tree structure. Curator achieves up to query speedup at with only build time and memory overhead (Jin et al., 3 Jan 2026).
JAG
JAG generalizes graph-based methods by introducing filter/attribute distances and constructing multi-threshold proximity graphs. At query time, a lexicographic ordering provides continuous search guidance, preventing dead-ends and unifying support for label, range, subset, and Boolean constraints. JAG is the first filter-agnostic proximity graph with empirical robustness across all and filter types (Xu et al., 10 Feb 2026).
Falconn++
Falconn++ applies a low-selectivity filter within hash buckets by thresholding on the main projection coordinate, filtering bucket contents from down to points and trading candidate quantity for query time and recall in a controlled way. With carefully chosen parameters (–$0.1$), Falconn++ exhibits 3–10 speedup over Falconn and matches or outperforms HNSW at high recall (Pham et al., 2022).
6. Practical Guidelines and System Integration
Findings across recent studies yield several practical rules (Amanbayev et al., 11 Feb 2026, Shi et al., 9 Sep 2025):
- Select index by expected selectivity: For , partition/inverted-file methods or hybrid trees are preferred; for , graph-based methods are advantageous.
- Parameterization: Graph degree, number of entry points, , and ef-search must be judiciously tuned based on and . In low- regimes, exact scan fallback may be optimal, especially if candidates are few.
- System adaptations: Milvus employs hybrid execution and dual-priority queues for robust recall; pgvector in PostgreSQL can expose exact kNN plans via B-tree scans on filter columns, avoiding recall cliffs from post-filtering (Amanbayev et al., 11 Feb 2026).
- Empirical cost modeling: For graph methods, degrades super-linearly as ; for partition/IVF, .
- Robustness to filter type: Only hybrid/integrated and filter-agnostic methods such as JAG and Curator exhibit uniform throughput and recall as regardless of filter complexity.
7. Open Challenges and Future Directions
Despite recent algorithmic progress, several open challenges remain (Li et al., 22 Aug 2025, Shi et al., 9 Sep 2025, Jin et al., 3 Jan 2026):
- Dynamic index maintenance: Supporting efficient insertions, deletions, and updates across arbitrary filter predicates remains nontrivial, especially for graph and partition-based indices.
- Auto-tuning: Determining optimal index and search/hyperparameters online as changes per query/dataset is an open engineering problem.
- Universal filter-robust indices: While JAG demonstrates filter-agnostic robustness, generalizing these insights to support arbitrary, evolving predicate classes at scale is an ongoing research area.
- Cost modeling and query planning: Accurate, closed-form models for are needed for query optimizers in production vector databases and hybrid RAG pipelines.
- Recall-latency tradeoff diagnostics: The GLS metric enables per-query analysis of recall loss risks, but integrating such diagnostics into production systems is only just emerging (Amanbayev et al., 11 Feb 2026).
In summary, low-selectivity filtered ANNS research has progressed from specialized and brittle solutions to robust, filter-agnostic hybrid and partition-based indices capable of maintaining high throughput and recall even at . Continued advances in theory, algorithm design, and system integration are essential for fully general, high-performance retrieval under arbitrary structured filtering constraints.