Papers
Topics
Authors
Recent
2000 character limit reached

COMPASS Framework: Hybrid Filtered Search

Updated 7 January 2026
  • COMPASS is a unified hybrid search framework that integrates vector and relational search methods to handle filtered queries over mixed-schema datasets.
  • It employs cooperative query execution by interleaving HNSW graph traversals with IVF-based B-tree scans through a shared candidate queue to balance selectivity.
  • Empirical evaluations demonstrate significant speedups and reduced memory footprint compared to state-of-the-art baselines, highlighting its real-world viability.

COMPASS Framework

The name "COMPASS" is used for several sophisticated frameworks across diverse subfields of computer science and engineering, each distinguished by its formal structure, evaluation targets, or algorithmic innovations. This article provides a comprehensive technical explication of the COMPASS framework for general filtered search in hybrid vector–relational databases, as proposed in "Compass: General Filtered Search across Vector and Structured Data" (Ye et al., 31 Oct 2025), including system architecture, data/query models, execution strategies, cost/performance analysis, and empirical properties.

1. Unified Hybrid Search Framework

COMPASS addresses the problem of executing general filtered search queries over datasets where each record consists of both a high-dimensional embedding and arbitrary relational (structured) attributes. The system is engineered to support filtered queries with arbitrary Boolean logic—including conjunctions, disjunctions, and range predicates—while maintaining compatibility with standard vector database and DBMS infrastructure, without bespoke or specialized index designs.

System architecture merges two proven index modalities:

  • Vector index (G): A proximity-graph-based index, typically HNSW, built over all dd-dimensional embeddings.
  • Clustered relational index (B): An IVF partitioning of vectors into nlistn_{list} clusters, each storing within-cluster B+-trees (or learned 1D indices) over numeric relational attributes.
  • Shared Candidate Queue (SharedQ): A global min-heap of (distance, recordID) tuples, fed by both G and B.
  • Progressive Search Control: Parameters (efsefs for G, efiefi for B) adapt at runtime to observed predicate selectivity.

This design makes it possible for COMPASS to coordinate vector-based retrieval with arbitrary relational filtering, without needing any novel index construction, and leverages off-the-shelf components present in mainstream vector DBMS architectures (Ye et al., 31 Oct 2025).

2. Data Model and General Query Semantics

The framework operates over a mixed-schema dataset

D={(viRd,aiA)}nD = \{\, (v_i \in \mathbb{R}^d,\, a_i \in \mathcal{A})\,\}_{n}

where viRdv_i\in\mathbb{R}^d is an embedding (e.g., from images, text), and aia_i are structured attributes with domain A={A1,,Am}\mathcal{A} = \{A_1,\ldots, A_m\}.

Filtered queries have the form

Q=(qRd,p:A{true,false})Q = (q \in \mathbb{R}^d,\, p: \mathcal{A} \rightarrow \{\mathrm{true}, \mathrm{false}\})

where qq is a query embedding and pp a Boolean predicate composed of atomic clauses of forms Aj=cA_j = c, AjcA_j \leq c, c1Ajc2c_1 \leq A_j \leq c_2, and arbitrary combinations via \wedge and \vee. The objective is to find the approximate top-kk records nearest to qq (according to metric δ(,)\delta(\cdot, \cdot)), restricted to those satisfying pp.

Recall is defined as

R=SkSkSkR = \frac{|S_k \cap S_k^*|}{|S_k^*|}

where SkS_k is the returned set and SkS_k^* the exact filtered top-kk.

3. Cooperative Query Execution Strategy

At the core of COMPASS is a “candidate racing” algorithm that interleaves pull-based iterations over the HNSW (G) and IVF-within-cluster-B+-tree (B) indices, mediated by a shared heap and a bitmap for visitation tracking:

1
2
3
4
5
6
7
8
9
10
11
12
13
function CompassSearch(G, B, q, p, k, ef):
    SharedQ  empty min-heap
    Visited  bitset[n] (all false)
    TopQ  empty max-heap (cap=ef)
    G.Open(q, p, SharedQ, Visited)
    B.Open(q, p, SharedQ, Visited)
    while TopQ.size < ef:
        records, sel = G.Next(SharedQ)
        push all records into TopQ
        if sel < β:
            records = B.Next(SharedQ)
            push all records into TopQ
    extract top-k from TopQ as result

  • G.Next(SharedQ): Performs progressive HNSW traversal, expanding selectively: for high local pass-rate (selαsel\geq \alpha), do 1-hop; if selβsel\geq \beta, do 2-hop; else return all neighbors passing the predicate.
  • B.Next(SharedQ): Progressive IVF probe runs a small proximity search over centroids, then scans each new cluster’s B-tree until efiefi predicate-passing records are found and pushed into SharedQ.

The algorithm “pivots” from vector-centric to relational-centric candidate injection whenever the local pass-rate of neighbors for the current predicate drops below a threshold β\beta (typically β5%\beta\approx 5\%), enabling robust completion even under highly selective, multi-attribute filters.

4. Cost Analysis and Dynamic Adaptation

COMPASS achieves efficiency and robustness across the spectrum from loose to extremely selective filters by dynamically balancing two core costs per query:

  • CgraphC_{graph}: O(efsd)O(efs \cdot d) vector computations plus per-neighbor predicate checks, expressed in the number of HNSW traversals and filtering steps in G.
  • CrelC_{rel}: O(efilogNc)O(efi \cdot \log N_c) B-tree probes within clusters (where NcN_c is the cluster size) and centroid hops in B.

When global predicate pass-rate rr is high, G dominates; as rr decreases, the algorithm automatically leans on B for candidate generation. The amortized per-query complexity is

O(min[efs,d,r]d+efilog(N/nlist))O\big( \min[efs,\, d,\, r] \cdot d \,+\, efi \cdot \log (N / n_{list}) \big)

Formal bounds for HNSW search and B-tree lookup phases, including parameter choices (efsefs, efiefi, α\alpha, β\beta heuristics), are specified to maintain recall, throughput, and resilience under compositional or highly-selective filter regimes (Ye et al., 31 Oct 2025).

5. Implementation, Integration, and Practical Operation

COMPASS leverages industry-standard HNSW libraries (FAISS, Milvus) and DBMS-provided B+-trees or learned 1D indices. No new index constructs or data storage changes are mandated. Instead, COMPASS can be integrated as:

  • A user-defined function or lightweight query processor within a DBMS.
  • An in-memory orchestration layer managing SharedQ and visitation masks.

For multi-attribute predicates, the system samples one attribute index at random (or by a simple cost estimate) for B-tree scan and applies in-line conjunctive checks for others.

Thresholds α\alpha and β\beta parameterize the selectivity-pivot and search expansion decision points, typically set to α30%\alpha\approx 30\% and β5%\beta\approx 5\%. Batch behavioral tuning is possible via empirical workload analysis.

6. Empirical Evaluation and Performance Characteristics

COMPASS was benchmarked on four hybrid vector/relational datasets (GIST, CRAWL, GLOVE100, VIDEO; up to 2M vectors; four numeric attributes), with state-of-the-art baselines: NaviX (general HNSW in-filtering), and SeRF×\,\times4 (specialized per-attribute vectors plus post-filtering).

Key results:

  • At recall 0.9\geq 0.9 and selectivity =30%=30\% per attribute, speedup ratios of COMPASS vs. best baseline increase sharply with filter dimensionality: 1.5×1.5\times (1D), 2.6×2.6\times (2D), 5.8×5.8\times (3D), 4.7×4.7\times (4D).
  • Index memory footprint is 300 MiB (COMPASS), versus 900 MiB (NaviX) and 600 MiB (SeRF\cdot4).
  • Query throughput is nondecreasing as more filters are added, a “selectivity helps” regime; in contrast, baseline QPS degrades sharply.
  • In settings with only a single attribute filter, COMPASS matches the QPS of specialized indices within 10–20%.

Empirical findings confirm that COMPASS is robust to complex filter combinations and does not suffer the combinatorial throughput collapse characteristic of in-filtering or post-filtering schemes (Ye et al., 31 Oct 2025).

7. Applications, Limitations, and Future Directions

Applications include:

  • E-commerce: filtering visually similar products by price, inventory, and category.
  • Content retrieval: searching for images or frames matching temporal, spatial, or descriptive constraints.
  • Multi-tenant vector database services requiring support for numerous arbitrary filters per query.

Limitations:

  • For high-cardinality attribute sets (tens of attributes), more sophisticated intra-cluster indices (like R-trees, KD-trees) may be beneficial.
  • The current B-tree scan scheme randomly chooses which attribute index to probe for multi-attribute predicates, introducing non-optimal conjunctive predicate check overhead. Incorporation of a lightweight cost-based planner is an open avenue.

Extendibility:

  • Dynamic support for insertions/deletions can be realized by integrating dynamic-graph update algorithms to keep clusters and HNSW up to date.
  • Generalization to learned or multi-dimensional cluster-wise indices is possible for even further query optimization.

In summary, COMPASS offers a principled, practical, and scalable framework for hybrid vector–structured filtered search, directly enabling efficient approximate top-kk retrieval for arbitrary conjunctive, disjunctive, and range predicates with only off-the-shelf indexing infrastructure, across a broad spectrum of database workloads (Ye et al., 31 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to COMPASS Framework.