Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NaviX: Native Vector Index in GDBMS

Updated 1 July 2025
  • NaviX is a native vector indexing system integrated within graph DBMSs that supports predicate-agnostic, filtered kNN search using adaptive-local heuristics.
  • It leverages a Hierarchical Navigable Small-World (HNSW) proximity graph and prefiltering with node semimasks to optimize query accuracy and performance.
  • Its seamless integration with DBMS infrastructure provides transactional guarantees and scalability, making it ideal for hybrid predictive and analytical workloads.

NaviX denotes several distinct systems and frameworks in current research, spanning vector indexing for graph databases, scalable reinforcement learning environments, and navigation solutions in domains from computer vision to X-ray astronomy. This article provides a comprehensive overview of the principal "NaviX" system as a native vector index for graph DBMSs, contextualizes it among related research threads, and clarifies technical, methodological, and practical dimensions.

1. Native Vector Indexing in Graph DBMSs

NaviX is a disk-based vector indexing system natively integrated within a graph database management system (GDBMS). Its design is motivated by the need for unified support of modern predictive applications, particularly those that require joint querying over vector embeddings and graph-structured or relational properties.

Two primary goals define NaviX's architecture:

  • To implement a disk-based vector index that leverages the GDBMS’s intrinsic storage and query-processing facilities;
  • To support predicate-agnostic, filtered k-nearest neighbor (kNN) search queries, where the nearest neighbors to a query vector vQv_Q are identified exclusively within an arbitrary, ad-hoc subset SS of vectors produced by a subquery QSQ_S.

The backend uses a Hierarchical Navigable Small-World (HNSW) proximity graph as its core index structure. HNSW’s multilevel organization aligns natively with the GDBMS data model—lower layers (containing all vectors and their adjacency) and upper sampled layers (for efficient entry-point search) are managed as compressed sparse row (CSR) graphs using the existing buffer management and disk-based infrastructure of the GDBMS.

This approach allows the vector index to inherit all core database benefits: transactional guarantees, concurrent querying, disk/memory locality, and unified query optimization strategies.

NaviX addresses the challenge of filtered vector search by precomputing the subset SS using the predicate subquery QSQ_S. The result of QSQ_S—potentially involving arbitrary filters or joins—is materialized as a "node semimask," a bitmask indicating which graph nodes belong to SS.

The kNN vector search operator then uses this semimask to restrict its candidate exploration. This prefiltering approach leverages the full knowledge of SS before commencing vector search, which contrasts with postfiltering approaches that discover neighbors and then test for set membership, potentially incurring redundant computation.

Prefiltering can present challenges if the induced subgraph G[S]G[S] is sparse or disconnected under low selectivity, making kNN search expensive or incomplete. NaviX addresses this by deploying adaptive search heuristics.

3. Adaptive Search Algorithm and Implementation

NaviX develops an adaptive-local search algorithm to ensure robust performance across a range of selectivities (proportion of SS within all graph nodes) and query–subset correlations. The algorithm dynamically chooses exploration heuristics at each node during graph traversal based on "local selectivity," i.e., the density of selected neighbors in the vicinity of the current candidate node.

Key implemented heuristics include:

  • onehop-s: Only explore selected immediate neighbors (optimal at high selectivity).
  • blind: At low selectivities, consider up to MM selected nodes among immediate and second-degree neighbors in arbitrary order.
  • directed: Direct traversal by order of proximity to vQv_Q (optimal at moderate selectivities).
  • adaptive-global: Selects a single heuristic for the entire query based on global selectivity.
  • adaptive-local (NaviX default): Chooses the most effective heuristic for each candidate node, based on the fraction σl=S(nbrs(cmin))nbrs(cmin)\sigma_l = \frac{|S(\mathrm{nbrs}(c_\mathrm{min}))|}{|\mathrm{nbrs}(c_\mathrm{min})|}, where nbrs(cmin)\mathrm{nbrs}(c_\mathrm{min}) denotes neighbors of the current candidate node cminc_\mathrm{min}.

Disk-based deployment involves storing lower HNSW levels as CSRs in the GDBMS, vectors in a columnar format, and reusing the buffer manager’s memory and I/O capabilities. Distance computations invoke a "zero-copy" process whereby the vector search routines operate directly on buffered data.

The search loop can be summarized as:

1
2
3
4
5
6
while(C != empty):
    cmin = pop_min(C)
    for neighbor n in nbrs(cmin):
        if n in S and n not in visited:
            compute distance d(v_q, n)
            insert C, n
The adaptive-local logic determines which nn to explore at each iteration.

4. Experimental Analysis

Benchmarks were performed on datasets ranging from one to over fifteen million vectors, and with both uncorrelated (random filters) and correlated (attribute or join-based) selection subqueries.

Key results include:

  • NaviX’s adaptive-local algorithm outperforms both prefiltering and postfiltering baselines, with up to 1.7× improvement over non-local adaptive heuristics in correlated queries.
  • NaviX is consistently faster than Weaviate, Milvus, ACORN, and iRangeGraph for prefiltering-based vector search, and outperforms general-purpose methods (PGVectorScale, VBase) and disk-based systems (DiskANN, FilteredDiskANN) in disk-bound and intermediate selectivity regimes.
  • All experiments are normalized to matched recall benchmarks (typically 95%), demonstrating that improved speed does not compromise neighbor retrieval accuracy.
  • The advantage of GDBMS-native storage is most pronounced under fluctuating disk/memory allocations, where caching adjacency lists and vector storage enhances throughput as buffer memory increases.

The paper’s ablation studies show that adaptive-local’s gains are algorithmic and not purely infrastructural: when implemented atop FAISS (independent of the GDBMS), adaptive-local still surpasses other heuristics.

5. Application Domains

NaviX’s capabilities are applicable to a range of hybrid data and predictive workloads, including:

  • Retrieval-Augmented Generation (RAG): For question-answering and LLM workflows, where relevant data “chunks” must be retrieved based on semantic similarity as well as graph/relational constraints.
  • Recommendation Systems: Combining attribute filtering (e.g., price, location) with vector similarity of product or user embeddings.
  • Knowledge Graph Analytics: Supporting joint queries over entities, their relationships, and vector semantics within graph-structured and embedded databases.

The architecture allows users to express complex hybrid queries in the GDBMS’s language (e.g., Cypher), benefiting from a uniform query interface and transactional semantics.

6. Implications of GDBMS-Native Vector Indices

Integration of vector indices directly into GDBMSs has several practical and engineering implications:

  • Unified predictive and analytical workflows: Hybrid queries can seamlessly mix graph, attribute, and vector-based criteria.
  • Resource efficiency and maintainability: The vector index layer reuses all transaction, query optimization, concurrency, buffer management, and security features of the core DBMS.
  • Scalability: Performance scales automatically with improvements to underlying database infrastructure (I/O, buffer, parallelism).
  • Expressivity: Researchers and practitioners can formulate expressive, composable queries for advanced analytics, predictive modeling, recommendation, and search.

The NaviX abbreviation appears in other research contexts, notably:

  • As a reinforcement learning environment reimplementation in JAX, emphasizing scalability and accelerator support ["NAVIX: Scaling MiniGrid Environments with JAX" (2407.19396)];
  • As part of navigation and scene analysis solutions in computer vision, and historically as a label for X-ray navigation programs in astrophysics ["The NRL Program in X-ray Navigation" (1712.03832)].

Each use of the term denotes a specialized system; the present vector index NaviX is distinguished by its focus on hybrid graph–vector query support in enterprise and research data platforms.


NaviX thus represents a fully integrated, adaptive, and robust vector search index for graph DBMSs, designed for predicate-agnostic filtering and efficient support of contemporary predictive and hybrid graph workloads. By exploiting the structural alignment of proximity graphs and graph DBMS infrastructure, and by employing dynamic search heuristics, it achieves competitive speed and flexibility for advanced applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)