Proximity Graphs with Filter Support
- Proximity graphs with filter support are advanced data structures that embed user-defined filters directly into graph indexing for constraint-driven similarity search.
- Key methodologies include filter-agnostic online filtering, segment-tree based multi-graph approaches, and joint attribute graphs that combine vector and filter distances.
- Empirical evaluations show these approaches offer significant efficiency gains in range-filtered ANN and durable pattern mining, achieving robust performance under varied constraints.
A proximity graph with filter support is a graph-based data structure or index designed to efficiently answer similarity or pattern queries under user-specified constraints or filters. These filters often operate over attribute metadata, lifespans, or arbitrary user predicates, and are integrated directly into the graph construction or traversal algorithms. This approach enables single-stage search processes for tasks such as attribute-constrained nearest neighbor search, range-filtered similarity search, temporal pattern mining under durability constraints, and robust support for arbitrary filter types and selectivities. The following sections review the principal methodologies and theoretical innovations underpinning proximity graphs with filter support, representative solution frameworks, analytical properties, and empirical findings in the literature.
1. Formal Problem Models and Taxonomy
The general proximity graph with filter support problem can be formalized as follows:
- Given a dataset of vectors and corresponding attribute tuples or temporal labels,
- Given a dissimilarity function (commonly Euclidean or cosine),
- Given a filter or predicate , often with structural or semantic domain,
- Find the top nearest neighbors to a query among those satisfying , or enumerate subgraphs that satisfy additional filter conditions (such as temporal durability or aggregate attribute coverage) (Zhao et al., 2022, Agarwal et al., 2024, Xu et al., 2024, Xu et al., 10 Feb 2026).
These filter types include:
- Arbitrary user predicates: as in AIRSHIP, where is a black-box function (Zhao et al., 2022).
- Numeric attribute ranges: e.g., RFANN queries as (Xu et al., 2024).
- Equality, Boolean, and subset-based filters: e.g., label, tag or attribute conjunction filters (Xu et al., 10 Feb 2026).
- Temporal durability constraints: where both structure and persistence (e.g., intersection of lifespans) define valid subgraphs (Agarwal et al., 2024).
A table of prominent scenarios:
| Task Type | Filter Type | Key Reference |
|---|---|---|
| Constrained ANN | Boolean/user predicate | (Zhao et al., 2022) |
| Range-filtered ANN | Numeric range | (Xu et al., 2024) |
| Robust filtered ANN (multi-type) | Arbitrary | (Xu et al., 10 Feb 2026) |
| Durable pattern mining | Temporal/lifespan | (Agarwal et al., 2024) |
2. Graph Structures: Index Construction and Attribute Integration
Three primary strategies have emerged to couple proximity graph construction with filter-awareness:
2.1 Filter-Agnostic Graph with Online Filtering
Classical proximity graphs (e.g., HNSW or Vamana) are constructed ignoring filters; filters are enforced only during query traversal or as a post-processing step. This allows arbitrary filters without index change but incurs efficiency and recall penalties under selective filters (Zhao et al., 2022).
2.2 Multi-Graph or Segment-Tree Approaches
For range or categorical filters, a collection of "elemental" proximity graphs is precomputed—each indexing a subset (segment) of the dataset according to the attribute's value, such as segment tree partitions for contiguous ranges. At query time, a valid subgraph is dynamically assembled by on-the-fly union or traversal of relevant segment graphs. Space complexity is 0 for degree 1 (Xu et al., 2024).
2.3 Joint Attribute Graphs (JAG) Framework
Attributes are mapped to continuous "attribute distances" and "filter distances," producing a unified structure. At each construction layer, the typical vector distance is combined lexicographically with capped attribute distances under a set of thresholds 2. Edges are allocated and pruned such that the resulting index remains robust to different filter selectivities and types (Xu et al., 10 Feb 2026).
In the temporal/durable graph case, filter support is enabled structurally through interval trees and cover trees/quadtree indexing over both vector and temporal/lifespan axes (Agarwal et al., 2024).
3. Filter-Aware Query Algorithms
Efficient query processing in proximity graphs with filter support leverages either explicit or implicit filter integration:
3.1 AIRSHIP: Constrained Search with User-Defined Functions
Search begins from sampled "seed" points known to satisfy 3 (via an 4-size sample, 5) and employs a two-directional traversal using two priority queues: one for filter-satisfied nodes and one for others. The 6-fractional heuristic enforces a balance between exploitation within filter-satisfying clusters and exploration into yet-unsatisfied neighborhoods. Nodes are added to result heaps only if 7 (Zhao et al., 2022).
3.2 iRangeGraph: Dynamic Range-Constrained Traversal
For RFANN, a search is initialized from the median rank of the range-filtered interval. Edges for each node are selected on-the-fly from the 8 elemental graphs covering the relevant range. The beam search is performed only over nodes within 9, with expansion determined by the segment tree (Xu et al., 2024).
3.3 JAG: Unified Greedy-Search Over Attribute and Filter Distances
JAG applies a greedy beam search over a single graph. At query time, candidate neighbors are ranked lexicographically by 0, ensuring traversal is guided toward filter-satisfying regions while maintaining vector similarity. Because edge selection at index construction covers all relevant attribute thresholds, no dead-ends arise for any filter type or sparsity (Xu et al., 10 Feb 2026).
3.4 Enumeration of Durable Patterns
Temporal proximity graphs support queries for 1-durable patterns (triangles, paths, etc.) via near-linear time algorithms, using interval trees for lifespan overlap and cover/quadtree structures for proximity (Agarwal et al., 2024). Incremental data structures enable reporting only new patterns as durability thresholds are changed interactively.
4. Complexity, Robustness, and Empirical Evaluation
4.1 Complexity
- AIRSHIP: For sample size 2 and 3 visited nodes, 4 query time, usually with 5 (Zhao et al., 2022).
- iRangeGraph: Space and build time 6; per-node edge-selection in 7, beam search of 8 candidates for top-9 queries (Xu et al., 2024).
- JAG: Index build time 0 (1beam size, 2degree, 3#thresholds), query time 4 (Xu et al., 10 Feb 2026).
- Durable patterns: Preprocessing 5, pattern enumeration and update cost 6 (Agarwal et al., 2024).
4.2 Robustness Across Filter Types
- JAG achieves recall and throughput that remain robust to filter type, selectivity, and correlation with embedding similarity. Attribute and filter distances ensure the graph is navigable under arbitrary constraints (Xu et al., 10 Feb 2026).
- iRangeGraph matches "oracle" (pre-materialized) approaches in recall and QPS, but with feasible space and construction cost (Xu et al., 2024).
- AIRSHIP demonstrates throughput improvements of 7–8 over post-filtered HNSW for 9 selectivities (Zhao et al., 2022).
- Durable pattern enumeration scales linearly on large (0) proximity graphs (Agarwal et al., 2024).
Empirical evaluations consistently focus on large-scale benchmarks such as SIFT1M, MNIST, LAION, YFCC10M, and web-scale retrieval contexts (Zhao et al., 2022, Xu et al., 2024, Xu et al., 10 Feb 2026).
5. Optimization Strategies and Parameterization
Key algorithmic strategies and their parameter considerations include:
- Seed sampling: For filters with retention 1, sample 2 to have about 3 satisfied seeds (AIRSHIP) (Zhao et al., 2022).
- Mixing ratio 4: Adaptive 5-balancing in AIRSHIP for optimal trade-off between cluster exploitation and exploration (Zhao et al., 2022).
- Segment tree design: Balancing segment size and depth determines iRangeGraph's index size and edge-redundancy (Xu et al., 2024).
- Joint threshold selection: Multiple attribute thresholds in JAG's construction (6) enable consistent performance across selectivities; 3-4 thresholds empirically suffice (Xu et al., 10 Feb 2026).
- Beam and degree parameters: Search and build beam sizes (7), and graph degrees (8) define the QPS/recall/space trade-off envelope (Zhao et al., 2022, Xu et al., 2024, Xu et al., 10 Feb 2026).
Recommended settings (for 9 or moderate selectivity): 0 (graph degree), 1 (sample size), 2 matched to graph neighbor statistics, 3–4 (degree in iRangeGraph), 5–6 (thresholds in JAG).
6. Extensions and Applications
Proximity graphs with filter support generalize to:
- Multi-attribute filtered search (e.g., combination of range and categorical attributes), with probabilistic edge sampling and query-guided neighbor selection (Xu et al., 2024, Xu et al., 10 Feb 2026).
- Temporal networks and mining of resilient or persistent structures, enabled by combining proximity and interval/durability constraints, supporting interactive pattern analytics (Agarwal et al., 2024).
- Robust integration in retrieval and recommendation systems where filters range from simple tags to arbitrary complex Boolean logic (Zhao et al., 2022, Xu et al., 10 Feb 2026).
A plausible implication is that filter-supporting proximity graphs increasingly serve as the backbone for large-scale vector search systems that must efficiently answer filtered queries at web scale, without precomputing dedicated indices for every possible filter configuration.
7. Comparative Summary
A synthesis of modern approaches:
| Method | Filter Support | Index Structure | Complexity | Recall/QPS Robustness |
|---|---|---|---|---|
| AIRSHIP | Arbitrary, arbitrary | Single proximity graph + seeds | 7 | High (8), scalable |
| iRangeGraph | Numeric range | 9 segment graphs | 0 | Near-oracle with 1 |
| JAG | Arbitrary (label, range, subset, Boolean) | Single graph with attribute distances | 2 | Uniform; outperforms state-of-art |
| Durable Graph | Temporal/durability | Cover tree + interval trees | 3 | Scalable to millions; exact in 4 |
These architectures collectively establish the state-of-the-art in filter-aware similarity and pattern search, demonstrating empirical and theoretical performance nearly matching filter-specialized or filter-naive oracular baselines, but at practical space and compute budgets. For full constructions, algorithms, and analytical proofs, see the cited works (Zhao et al., 2022, Agarwal et al., 2024, Xu et al., 2024, Xu et al., 10 Feb 2026).