Policy Alignment Score (PAS)
- Policy Alignment Score (PAS) is a metric that quantifies how well policy objectives match the performance outcomes in systems integrating vector similarity and relational filtering.
- PAS is applied in hybrid search frameworks to evaluate the efficacy of cooperative query execution between standard index structures, such as HNSW and B+-trees.
- By leveraging PAS, researchers can identify optimization opportunities and fine-tune system parameters, ensuring robust performance even under complex filtering conditions.
The COMPASS framework is a unified system for filtered search across hybrid vector and structured (relational) data, enabling efficient, general support for queries combining high-dimensional vector similarity with arbitrary relational filtering. Unlike prior systems which require specialized filtered-vector indices or lack integration with robust database management systems (DBMSs), COMPASS leverages established index structures—including approximate nearest neighbor (ANN) methods like HNSW and IVF for vector search, and B+-trees for structured predicate evaluation—coordinated through a principled cooperative query execution scheme. This architecture supports arbitrary Boolean combinations of predicates (conjunctions, disjunctions, ranges) and scales robustly even under highly selective or high-dimensional filters, without inventing new index designs or sacrificing single-attribute query performance (Ye et al., 31 Oct 2025).
1. System Architecture and Index Structures
COMPASS is centered on the interplay between two high-performance, off-the-shelf index families and a novel cooperative “racing” execution mechanism:
- Vector index (G): An HNSW proximity graph indexes all -dimensional embeddings for efficient approximate nearest neighbor retrieval.
- Clustered relational index (B): The vector space is partitioned via IVF clustering; each cluster maintains per-attribute B+-trees (or alternative 1D/learned indices) on relational columns.
- Shared Candidate Queue (SharedQ): A memory-resident global min-heap collects candidate tuples (with distance and record IDs), dynamically sourced from both G and B, supporting coordinated ranking and de-duplication.
- Progressive Search Control: Search width parameters (graph search width , relational probe size ) adapt to observed local predicate pass-rates, optimizing candidate throughput.
The query process initiates in G’s graph-traversal engine (HNSW); if the local predicate pass rate falls below threshold , the system pivots to B, using IVF cluster selection and within-cluster B+-tree scans to introduce new predicate-matching candidates. Both engines feed into SharedQ, ensuring all candidate sets are globally ranked by vector distance, regardless of their originating engine.
2. Hybrid Data and Query Model
COMPASS operates on datasets of the form
where each tuple consists of a dense vector embedding and a set of relational attributes drawn from schema .
A filtered search query is: where is any Boolean combination (arbitrary , , intervals) of atomic predicates (e.g., , , ). The result is the approximate top- set (by distance ) among records with .
Recall is defined as: with the exact top- after filtering.
3. Cooperative Query Execution and Search Control
COMPASS introduces a cooperative, interleaved search strategy (“candidate racing”) where G and B mutually compensate for each other's local weaknesses:
- G.Next(SharedQ): Performs adaptive HNSW traversal. The per-hop expansion is governed by the observed neighborhood pass-rate (sel):
- If (typically ), apply single-hop expansion.
- If , perform limited two-hop expansion.
- If , the engine yields to B, as further expansion is unpromising under the predicate.
- B.Next(SharedQ): Executes progressive proximity search on cluster centroids, identifies relevant clusters, then within-cluster B+-trees are scanned until predicate-satisfying candidates are found, injecting these into SharedQ.
Pseudocode exposes the high-level interleaving and local control logic. The shared visitation bitmap ensures candidates are ranked exactly once.
4. Complexity Analysis and Cost Adaptation
Query cost is dynamically allocated based on the global predicate pass rate :
- When is high, —the cost from G ( distance computations plus lightweight predicate checks)—dominates.
- When is low, —from B ( per B-tree probe, approximate cluster size)—becomes core.
- Complexity per query (amortized):
- HNSW:
- B-tree:
- Combined:
Heuristics for expansion thresholds (, ), dynamic tuning of /, and candidate queue sizing are integral for optimal cost sharing.
5. Implementation and Integration
COMPASS is designed for immediate integration with modern DBMSs:
- Utilizes existing index libraries (HNSW as in FAISS/Milvus, B+-trees) with no new index design.
- SharedQ and visitation bitmap are in-memory, sitting cleanly above underlying indices.
- Predicate pushdown across vector and relational modalities requires minimal engine modifications; multi-attribute filters are handled by choosing an attribute index (random or cost-estimate) and applying conjunctive checks in-line.
- Offers backwards compatibility and incremental adoption.
6. Empirical Performance and Benchmarking
In rigorous evaluation across four datasets (GIST, CRAWL, GLOVE100, VIDEO; up to 2M vectors; 4 uniform numeric attrs), COMPASS demonstrates:
| Dimensions | 1D | 2D | 3D | 4D |
|---|---|---|---|---|
| Speedup vs best | 1.5× | 2.6× | 5.8× | 4.7× |
- Compass index size: 300 MiB; NaviX: 900 MiB; SeRF: 600 MiB.
- Throughput (QPS) improves with higher selectivity and additional filter predicates, unlike baseline systems where performance can degrade with filter complexity.
- Matches per-attribute specialized indices within 10–20% QPS overhead in single-attribute settings, while preserving full filtering generality.
- Outperforms NaviX and SeRF x4 on hybrid queries, especially in multi-attribute and tight-selectivity regimes.
7. Use Cases, Limitations, and Future Directions
Canonical Use Cases:
- E-commerce: multimodal product search with price, inventory, and semantic similarity filters.
- Media retrieval: filtering image/video frames by vector embeddings and complex metadata predicates.
- Multi-tenant vector DB services: supporting high filter dimensionality per query.
Limitations:
- The random selection of attribute indices in B-tree scans is suboptimal; a lightweight cost-based planner could enhance multi-attribute filter efficiency.
- For extremely high attribute dimensionality, per-cluster multi-dimensional indices (R-trees, KD-trees) could yield further gains.
- Dynamic dataset support (inserts/deletes) would benefit from the latest dynamic-graph index update algorithms.
Future extensions may include: more advanced attribute indexing strategies, hierarchical index orchestration for even wider filter sets, and robust support for changing data distributions with dynamic indexing.
Summary: COMPASS establishes a robust, general-purpose framework for hybrid filtered search by orchestrating standard vector and relational indices through an adaptive, cooperative execution architecture. It achieves state-of-the-art throughput and scalability for arbitrary Boolean, range, and multi-attribute queries in vector-structured databases, without bespoke index designs or loss of DBMS integration (Ye et al., 31 Oct 2025).