Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High Dimensional Similarity Search with Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility (1907.06146v3)

Published 13 Jul 2019 in cs.IR and cs.DB

Abstract: Approximate Nearest Neighbor Search (ANNS) in high dimensional space is essential in database and information retrieval. Recently, there has been a surge of interest in exploring efficient graph-based indices for the ANNS problem. Among them, Navigating Spreading-out Graph (NSG) provides fine theoretical analysis and achieves state-of-the-art performance. However, we find there are several limitations with NSG: 1) NSG has no theoretical guarantee on nearest neighbor search when the query is not indexed in the database; 2) NSG is too sparse which harms the search performance. In addition, NSG suffers from high indexing complexity. To address the above problems, we propose the Satellite System Graphs (SSG) and a practical variant NSSG. Specifically, we propose a novel pruning strategy to produce SSGs from the complete graph. SSGs define a new family of MSNETs in which the out-edges of each node are distributed evenly in all directions. Each node in the graph builds effective connections to its neighborhood omnidirectionally, whereupon we derive SSG's excellent theoretical properties for both indexed and unindexed queries. We can adaptively adjust the sparsity of an SSG with a hyper-parameter to optimize the search performance. Further, NSSG is proposed to reduce the indexing complexity of the SSG for large-scale applications. Both theoretical and extensive experimental analyses are provided to demonstrate the strengths of the proposed approach over the existing representative algorithms. Our code has been released at https://github.com/ZJULearning/SSG.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Cong Fu (24 papers)
  2. Changxu Wang (3 papers)
  3. Deng Cai (181 papers)
Citations (12)

Summary

  • The paper presents a novel pruning strategy to construct SSG from a k-nearest neighbor graph, significantly enhancing search efficiency in high-dimensional spaces.
  • It provides rigorous theoretical analysis ensuring that SSG supports both indexed and unindexed queries through evenly distributed node connections and adjustable sparsity.
  • The proposed NSSG variant reduces indexing complexity without performance loss, offering a robust, scalable solution for real-world high-dimensional search applications.

High Dimensional Similarity Search with Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility

The paper addresses the problem of Approximate Nearest Neighbor Search (ANNS) in high-dimensional spaces, a crucial aspect for several applications across databases, information retrieval, and machine learning. Traditionally, this problem has been approached using various techniques, including tree-based, hashing-based, quantization-based, and graph-based methods. Among these, graph-based methods have demonstrated superior search performance, particularly in large datasets, leading to considerable interest in their further optimization.

The authors focus particularly on the limitations of the Navigating Spreading-out Graph (NSG), which, despite its efficacy, lacks certain guarantees and suffers from high index construction costs. Specifically, NSG does not guarantee performance when dealing with unindexed queries and can be overly sparse, which might harm search performance.

To overcome these limitations, the authors introduce a novel graph structure called the Satellite System Graph (SSG) and its variant, the Navigating Satellite System Graph (NSSG). SSG is designed with a novel pruning strategy from an encompassing complete graph. This structure is categorized under a new family of graphs termed MSNETs, characterized by nodes with out-edges distributed evenly in all directions. This arrangement facilitates effective connections to neighboring nodes in multiple directions and endows SSG with advantageous theoretical properties for handling both indexed and unindexed queries.

Key contributions of the paper include:

  1. Pruning Strategy: The authors present a pruning strategy that generates SSG from an approximate k-nearest neighbor graph (KNNG), enhancing retrieval efficiency.
  2. Theoretical Properties: The paper provides rigorous theoretical analysis, ensuring that SSG supports effective similarity searches for both indexed and unindexed queries. Additionally, a hyper-parameter is introduced to regulate the graph's sparsity, enabling performance optimization.
  3. Indexing Complexity Reduction: For large-scale applications, the NSSG is proposed to lower the indexing complexity without sacrificing performance. It firmly stands on the theoretical pillars of SSG while using practical heuristics to achieve scaling.

The results shown indicate SSG's superior performance in comparison to existing approaches, both in theoretical complexity and empirical evaluation across diverse datasets. In terms of practical implications, SSG and NSSG make compelling cases for being adopted in real-world high-dimensional similarity search tasks, where both efficiency and versatility in handling various query types are required.

Looking forward, this work potentially sets the stage for more advanced graph-based approaches in AI, particularly in domains where data continues to scale both in size and dimensional substructures. Further exploration could also delve into hybrid approaches, combining SSG's capabilities with other indexing methods to harness distinct strengths from different approaches. The released code on GitHub opens doors for further experimentation and adaptation across different research and application domains.

Github Logo Streamline Icon: https://streamlinehq.com