Papers
Topics
Authors
Recent
2000 character limit reached

Policy Alignment Score (PAS)

Updated 7 January 2026
  • Policy Alignment Score (PAS) is a metric that quantifies how well policy objectives match the performance outcomes in systems integrating vector similarity and relational filtering.
  • PAS is applied in hybrid search frameworks to evaluate the efficacy of cooperative query execution between standard index structures, such as HNSW and B+-trees.
  • By leveraging PAS, researchers can identify optimization opportunities and fine-tune system parameters, ensuring robust performance even under complex filtering conditions.

COMPASS Framework

The COMPASS framework is a unified system for filtered search across hybrid vector and structured (relational) data, enabling efficient, general support for queries combining high-dimensional vector similarity with arbitrary relational filtering. Unlike prior systems which require specialized filtered-vector indices or lack integration with robust database management systems (DBMSs), COMPASS leverages established index structures—including approximate nearest neighbor (ANN) methods like HNSW and IVF for vector search, and B+-trees for structured predicate evaluation—coordinated through a principled cooperative query execution scheme. This architecture supports arbitrary Boolean combinations of predicates (conjunctions, disjunctions, ranges) and scales robustly even under highly selective or high-dimensional filters, without inventing new index designs or sacrificing single-attribute query performance (Ye et al., 31 Oct 2025).

1. System Architecture and Index Structures

COMPASS is centered on the interplay between two high-performance, off-the-shelf index families and a novel cooperative “racing” execution mechanism:

  • Vector index (G): An HNSW proximity graph indexes all dd-dimensional embeddings for efficient approximate nearest neighbor retrieval.
  • Clustered relational index (B): The vector space is partitioned via IVF clustering; each cluster maintains per-attribute B+-trees (or alternative 1D/learned indices) on relational columns.
  • Shared Candidate Queue (SharedQ): A memory-resident global min-heap collects candidate tuples (with distance and record IDs), dynamically sourced from both G and B, supporting coordinated ranking and de-duplication.
  • Progressive Search Control: Search width parameters (graph search width efsefs, relational probe size efiefi) adapt to observed local predicate pass-rates, optimizing candidate throughput.

The query process initiates in G’s graph-traversal engine (HNSW); if the local predicate pass rate falls below threshold β5%\beta \approx 5\%, the system pivots to B, using IVF cluster selection and within-cluster B+-tree scans to introduce new predicate-matching candidates. Both engines feed into SharedQ, ensuring all candidate sets are globally ranked by vector distance, regardless of their originating engine.

2. Hybrid Data and Query Model

COMPASS operates on datasets of the form

D={(viRd,aiA)}i=1nD = \{ (v_i \in \mathbb{R}^d, a_i \in \mathcal{A}) \}_{i=1}^n

where each tuple consists of a dense vector embedding viv_i and a set of relational attributes aia_i drawn from schema A={A1,,Am}\mathcal{A} = \{A_1, …, A_m\}.

A filtered search query is: Q=(qRd, p:A{true,false})Q = (q \in \mathbb{R}^d, \ p: \mathcal{A} \to \{\text{true}, \text{false}\}) where p(a)p(a) is any Boolean combination (arbitrary \land, \lor, intervals) of atomic predicates (e.g., Aj=cA_j = c, AjcA_j \le c, c1Ajc2c_1 \le A_j \le c_2). The result is the approximate top-kk set (by distance δ(vi,q)\delta(v_i, q)) among records with p(ai)=truep(a_i)=\text{true}.

Recall is defined as: R=SkSkSk,R = \frac{|S_k \cap S^*_k|}{|S^*_k|}, with SkS^*_k the exact top-kk after filtering.

3. Cooperative Query Execution and Search Control

COMPASS introduces a cooperative, interleaved search strategy (“candidate racing”) where G and B mutually compensate for each other's local weaknesses:

  • G.Next(SharedQ): Performs adaptive HNSW traversal. The per-hop expansion is governed by the observed neighborhood pass-rate (sel):
    • If selα\text{sel} \ge \alpha (typically 30%\approx 30\%), apply single-hop expansion.
    • If βsel<α\beta \le \text{sel} < \alpha, perform limited two-hop expansion.
    • If sel<β\text{sel} < \beta, the engine yields to B, as further expansion is unpromising under the predicate.
  • B.Next(SharedQ): Executes progressive proximity search on cluster centroids, identifies relevant clusters, then within-cluster B+-trees are scanned until efiefi predicate-satisfying candidates are found, injecting these into SharedQ.

Pseudocode exposes the high-level interleaving and local control logic. The shared visitation bitmap ensures candidates are ranked exactly once.

4. Complexity Analysis and Cost Adaptation

Query cost is dynamically allocated based on the global predicate pass rate rr:

  • When rr is high, CgraphC_\text{graph}—the cost from G (O(efsd)O(efs \cdot d) distance computations plus lightweight predicate checks)—dominates.
  • When rr is low, CrelC_\text{rel}—from B (O(efilogNc)O(efi \cdot \log N_c) per B-tree probe, NcN_c approximate cluster size)—becomes core.
  • Complexity per query (amortized):
    • HNSW: O((efs/Δ)d+#edges)O((efs/\Delta)d + \#edges)
    • B-tree: O(efilogcluster)O(efi \cdot \log|cluster|)
    • Combined: O(min[efs,d,r]d+efilog(N/nlist))O(\min[efs, d, r]\, d + efi \cdot \log(N/nlist))

Heuristics for expansion thresholds (α\alpha, β\beta), dynamic tuning of efsefs/efiefi, and candidate queue sizing are integral for optimal cost sharing.

5. Implementation and Integration

COMPASS is designed for immediate integration with modern DBMSs:

  • Utilizes existing index libraries (HNSW as in FAISS/Milvus, B+-trees) with no new index design.
  • SharedQ and visitation bitmap are in-memory, sitting cleanly above underlying indices.
  • Predicate pushdown across vector and relational modalities requires minimal engine modifications; multi-attribute filters are handled by choosing an attribute index (random or cost-estimate) and applying conjunctive checks in-line.
  • Offers backwards compatibility and incremental adoption.

6. Empirical Performance and Benchmarking

In rigorous evaluation across four datasets (GIST, CRAWL, GLOVE100, VIDEO; up to 2M vectors; 4 uniform numeric attrs), COMPASS demonstrates:

Dimensions 1D 2D 3D 4D
Speedup vs best 1.5× 2.6× 5.8× 4.7×
  • Compass index size: \sim300 MiB; NaviX: \sim900 MiB; SeRF×4\,\times 4: \sim600 MiB.
  • Throughput (QPS) improves with higher selectivity and additional filter predicates, unlike baseline systems where performance can degrade with filter complexity.
  • Matches per-attribute specialized indices within 10–20% QPS overhead in single-attribute settings, while preserving full filtering generality.
  • Outperforms NaviX and SeRF x4 on hybrid queries, especially in multi-attribute and tight-selectivity regimes.

7. Use Cases, Limitations, and Future Directions

Canonical Use Cases:

  • E-commerce: multimodal product search with price, inventory, and semantic similarity filters.
  • Media retrieval: filtering image/video frames by vector embeddings and complex metadata predicates.
  • Multi-tenant vector DB services: supporting high filter dimensionality per query.

Limitations:

  • The random selection of attribute indices in B-tree scans is suboptimal; a lightweight cost-based planner could enhance multi-attribute filter efficiency.
  • For extremely high attribute dimensionality, per-cluster multi-dimensional indices (R-trees, KD-trees) could yield further gains.
  • Dynamic dataset support (inserts/deletes) would benefit from the latest dynamic-graph index update algorithms.

Future extensions may include: more advanced attribute indexing strategies, hierarchical index orchestration for even wider filter sets, and robust support for changing data distributions with dynamic indexing.

Summary: COMPASS establishes a robust, general-purpose framework for hybrid filtered search by orchestrating standard vector and relational indices through an adaptive, cooperative execution architecture. It achieves state-of-the-art throughput and scalability for arbitrary Boolean, range, and multi-attribute queries in vector-structured databases, without bespoke index designs or loss of DBMS integration (Ye et al., 31 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Policy Alignment Score (PAS).