COMPASS Framework: Hybrid Filtered Search

Updated 7 January 2026

COMPASS is a unified hybrid search framework that integrates vector and relational search methods to handle filtered queries over mixed-schema datasets.
It employs cooperative query execution by interleaving HNSW graph traversals with IVF-based B-tree scans through a shared candidate queue to balance selectivity.
Empirical evaluations demonstrate significant speedups and reduced memory footprint compared to state-of-the-art baselines, highlighting its real-world viability.

COMPASS Framework

The name "COMPASS" is used for several sophisticated frameworks across diverse subfields of computer science and engineering, each distinguished by its formal structure, evaluation targets, or algorithmic innovations. This article provides a comprehensive technical explication of the COMPASS framework for general filtered search in hybrid vector–relational databases, as proposed in "Compass: General Filtered Search across Vector and Structured Data" (Ye et al., 31 Oct 2025), including system architecture, data/query models, execution strategies, cost/performance analysis, and empirical properties.

1. Unified Hybrid Search Framework

COMPASS addresses the problem of executing general filtered search queries over datasets where each record consists of both a high-dimensional embedding and arbitrary relational (structured) attributes. The system is engineered to support filtered queries with arbitrary Boolean logic—including conjunctions, disjunctions, and range predicates—while maintaining compatibility with standard vector database and DBMS infrastructure, without bespoke or specialized index designs.

System architecture merges two proven index modalities:

Vector index (G): A proximity-graph-based index, typically HNSW, built over all $d$ -dimensional embeddings.
Clustered relational index (B): An IVF partitioning of vectors into $n_{list}$ clusters, each storing within-cluster B+-trees (or learned 1D indices) over numeric relational attributes.
Shared Candidate Queue (SharedQ): A global min-heap of (distance, recordID) tuples, fed by both G and B.
Progressive Search Control: Parameters ( $efs$ for G, $efi$ for B) adapt at runtime to observed predicate selectivity.

This design makes it possible for COMPASS to coordinate vector-based retrieval with arbitrary relational filtering, without needing any novel index construction, and leverages off-the-shelf components present in mainstream vector DBMS architectures (Ye et al., 31 Oct 2025).

2. Data Model and General Query Semantics

The framework operates over a mixed-schema dataset

$D = \{\, (v_i \in \mathbb{R}^d,\, a_i \in \mathcal{A})\,\}_{n}$

where $v_i\in\mathbb{R}^d$ is an embedding (e.g., from images, text), and $a_i$ are structured attributes with domain $\mathcal{A} = \{A_1,\ldots, A_m\}$ .

Filtered queries have the form

$Q = (q \in \mathbb{R}^d,\, p: \mathcal{A} \rightarrow \{\mathrm{true}, \mathrm{false}\})$

where $q$ is a query embedding and $p$ a Boolean predicate composed of atomic clauses of forms $A_j = c$ , $A_j \leq c$ , $c_1 \leq A_j \leq c_2$ , and arbitrary combinations via $\wedge$ and $\vee$ . The objective is to find the approximate top- $k$ records nearest to $q$ (according to metric $\delta(\cdot, \cdot)$ ), restricted to those satisfying $p$ .

Recall is defined as

$R = \frac{|S_k \cap S_k^*|}{|S_k^*|}$

where $S_k$ is the returned set and $S_k^*$ the exact filtered top- $k$ .

3. Cooperative Query Execution Strategy

At the core of COMPASS is a “candidate racing” algorithm that interleaves pull-based iterations over the HNSW (G) and IVF-within-cluster-B+-tree (B) indices, mediated by a shared heap and a bitmap for visitation tracking:

function CompassSearch(G, B, q, p, k, ef):
    SharedQ ← empty min-heap
    Visited ← bitset[n] (all false)
    TopQ ← empty max-heap (cap=ef)
    G.Open(q, p, SharedQ, Visited)
    B.Open(q, p, SharedQ, Visited)
    while TopQ.size < ef:
        records, sel = G.Next(SharedQ)
        push all records into TopQ
        if sel < β:
            records = B.Next(SharedQ)
            push all records into TopQ
    extract top-k from TopQ as result

G.Next(SharedQ): Performs progressive HNSW traversal, expanding selectively: for high local pass-rate ( $sel\geq \alpha$ ), do 1-hop; if $sel\geq \beta$ , do 2-hop; else return all neighbors passing the predicate.
B.Next(SharedQ): Progressive IVF probe runs a small proximity search over centroids, then scans each new cluster’s B-tree until $efi$ predicate-passing records are found and pushed into SharedQ.

The algorithm “pivots” from vector-centric to relational-centric candidate injection whenever the local pass-rate of neighbors for the current predicate drops below a threshold $\beta$ (typically $\beta\approx 5\%$ ), enabling robust completion even under highly selective, multi-attribute filters.

4. Cost Analysis and Dynamic Adaptation

COMPASS achieves efficiency and robustness across the spectrum from loose to extremely selective filters by dynamically balancing two core costs per query:

$C_{graph}$ : $O(efs \cdot d)$ vector computations plus per-neighbor predicate checks, expressed in the number of HNSW traversals and filtering steps in G.
$C_{rel}$ : $O(efi \cdot \log N_c)$ B-tree probes within clusters (where $N_c$ is the cluster size) and centroid hops in B.

When global predicate pass-rate $r$ is high, G dominates; as $r$ decreases, the algorithm automatically leans on B for candidate generation. The amortized per-query complexity is

$O\big( \min[efs,\, d,\, r] \cdot d \,+\, efi \cdot \log (N / n_{list}) \big)$

Formal bounds for HNSW search and B-tree lookup phases, including parameter choices ( $efs$ , $efi$ , $\alpha$ , $\beta$ heuristics), are specified to maintain recall, throughput, and resilience under compositional or highly-selective filter regimes (Ye et al., 31 Oct 2025).

5. Implementation, Integration, and Practical Operation

COMPASS leverages industry-standard HNSW libraries (FAISS, Milvus) and DBMS-provided B+-trees or learned 1D indices. No new index constructs or data storage changes are mandated. Instead, COMPASS can be integrated as:

A user-defined function or lightweight query processor within a DBMS.
An in-memory orchestration layer managing SharedQ and visitation masks.

For multi-attribute predicates, the system samples one attribute index at random (or by a simple cost estimate) for B-tree scan and applies in-line conjunctive checks for others.

Thresholds $\alpha$ and $\beta$ parameterize the selectivity-pivot and search expansion decision points, typically set to $\alpha\approx 30\%$ and $\beta\approx 5\%$ . Batch behavioral tuning is possible via empirical workload analysis.

6. Empirical Evaluation and Performance Characteristics

COMPASS was benchmarked on four hybrid vector/relational datasets (GIST, CRAWL, GLOVE100, VIDEO; up to 2M vectors; four numeric attributes), with state-of-the-art baselines: NaviX (general HNSW in-filtering), and SeRF $\,\times$ 4 (specialized per-attribute vectors plus post-filtering).

Key results:

At recall $\geq 0.9$ and selectivity $=30\%$ per attribute, speedup ratios of COMPASS vs. best baseline increase sharply with filter dimensionality: $1.5\times$ (1D), $2.6\times$ (2D), $5.8\times$ (3D), $4.7\times$ (4D).
Index memory footprint is 300 MiB (COMPASS), versus 900 MiB (NaviX) and 600 MiB (SeRF $\cdot$ 4).
Query throughput is nondecreasing as more filters are added, a “selectivity helps” regime; in contrast, baseline QPS degrades sharply.
In settings with only a single attribute filter, COMPASS matches the QPS of specialized indices within 10–20%.

Empirical findings confirm that COMPASS is robust to complex filter combinations and does not suffer the combinatorial throughput collapse characteristic of in-filtering or post-filtering schemes (Ye et al., 31 Oct 2025).

7. Applications, Limitations, and Future Directions

Applications include:

E-commerce: filtering visually similar products by price, inventory, and category.
Content retrieval: searching for images or frames matching temporal, spatial, or descriptive constraints.
Multi-tenant vector database services requiring support for numerous arbitrary filters per query.

Limitations:

For high-cardinality attribute sets (tens of attributes), more sophisticated intra-cluster indices (like R-trees, KD-trees) may be beneficial.
The current B-tree scan scheme randomly chooses which attribute index to probe for multi-attribute predicates, introducing non-optimal conjunctive predicate check overhead. Incorporation of a lightweight cost-based planner is an open avenue.

Extendibility:

Dynamic support for insertions/deletions can be realized by integrating dynamic-graph update algorithms to keep clusters and HNSW up to date.
Generalization to learned or multi-dimensional cluster-wise indices is possible for even further query optimization.

In summary, COMPASS offers a principled, practical, and scalable framework for hybrid vector–structured filtered search, directly enabling efficient approximate top- $k$ retrieval for arbitrary conjunctive, disjunctive, and range predicates with only off-the-shelf indexing infrastructure, across a broad spectrum of database workloads (Ye et al., 31 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

Compass: General Filtered Search across Vector and Structured Data (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to COMPASS Framework.