Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LotusFilter: Diversifying ANNS Results

Updated 30 June 2025
  • LotusFilter is a post-processing module that diversifies ANNS outputs by removing redundant near-duplicate vectors for enhanced retrieval.
  • It employs a precomputed cutoff table to quickly exclude candidates within a defined squared distance threshold, ensuring both query similarity and diversity.
  • Evaluations show LotusFilter adds only 0.02 ms per query with low memory overhead, making it effective for large-scale, high-dimensional applications.

LotusFilter is a post-processing module designed to diversify the results of approximate nearest neighbor search (ANNS), particularly in high-dimensional vector spaces such as those encountered with modern text and image embeddings. Standard ANNS techniques efficiently retrieve vectors that are closest to a query vector but often yield results that are not only similar to the query but also highly similar to each other, limiting their utility in scenarios—such as image retrieval, recommendation, or retrieval-augmented generation (RAG)—where both relevance and diversity are desired. LotusFilter addresses this shortcoming by providing a fast, deterministic method that eliminates redundant, near-duplicate vectors from ANNS candidate sets, producing outputs that preserve query similarity while ensuring diversity among retrieved items.

1. Post-Processing Diversification in ANNS

LotusFilter serves as a plug-in post-processing step that can be applied after any black-box ANNS retrieval system. Upon receiving an initial set of nearest neighbor candidates, LotusFilter iteratively prunes candidates that are “too close” to each other according to a predefined squared distance threshold ε\varepsilon. This process yields a subset in which each element remains close to the query while being sufficiently distant from every other element, thereby balancing relevance and diversity. Because the top-nearest neighbor to the query is always preserved, the recall of the most relevant item is guaranteed, and diversity is enforced without requiring adjustments to the ANNS backend or embedding models. Typical query processing time attributable to diversification is 0.02 ms per query for candidate sets of size S=500S=500 and desired output size K=100K=100 over a dataset of N=9×105N=9 \times 10^5 vectors.

2. Cutoff Table Construction and Role

The cutoff table is a precomputed structure central to the efficient operation of LotusFilter. For each vector xn\mathbf{x}_n in the database, this table records the IDs of all other vectors xi\mathbf{x}_i such that xnxi22<ε\|\mathbf{x}_n - \mathbf{x}_i\|_2^2 < \varepsilon. This captures the local neighborhood of each vector under the prescribed diversity threshold.

Preprocessing involves performing a range search for each database vector to identify its close neighbors. This process is conducted offline and typically requires less than one minute per million vectors, making it suitable for large-scale deployment. At query time, whenever a candidate vector is selected for inclusion in the diversified results, LotusFilter efficiently eliminates all its close (redundant) neighbors from further consideration by referencing the cutoff table. This enables constant-time neighborhood elimination without recomputing pairwise distances at query time, which is critical for high-throughput applications.

3. Performance Metrics and Comparative Evaluation

LotusFilter has been empirically evaluated using large-scale, high-dimensional datasets—such as 1536-dimensional OpenAI embeddings—for both text and image tasks. Its filtering step (“diversification”) adds only 0.02 ms per query, and the complete search-plus-filtering process totals approximately 1 ms per query, making it nearly as fast as conventional ANNS querying alone.

Method Final Score (lower better) Total Query Time (ms) Memory Overhead
Search only 0.200 0.855
Clustering 0.223 7.88 High ($32ND$)
GMM 0.177 14.4 High ($32ND$)
LotusFilter 0.171 1.03 Low (64LN\sim64LN bits)

Unlike methods based on clustering or Gaussian mixture models (GMM), which impose substantial storage cost by requiring the original high-dimensional vectors ($32ND$ memory overhead), LotusFilter’s predictable overhead is lower and does not grow with dimension DD but rather with the average cutoff table length LL per vector. This makes it suitable for high-dimensional and large-scale vector databases.

4. Practical Integration in Retrieval-Augmented Generation (RAG)

In RAG pipelines, where external text passages or contextual snippets are retrieved to augment LLM outputs, ANNS is standard practice for selecting candidate contexts using embedding similarity. LotusFilter integrates seamlessly as a post-processing layer: following ANNS-based candidate selection, LotusFilter removes near-duplicates, ensuring that the passages provided to the LLM are not only relevant but also diverse. No changes are required to the underlying RAG model, the embedding generation process, or the retrieval index. Experiments in the referenced paper utilize OpenAI’s 1536-dimensional embeddings and demonstrate the removal of near-identical or semantically redundant text among retrieved passages.

5. Algorithmic Formulation and Guarantees

The selection of the KK-element diversified subset K\mathcal{K} from candidate set S\mathcal{S} is formally posed as:

arg minKS,K=Kf(K)\argmin_{\mathcal{K} \subseteq \mathcal{S}, |\mathcal{K}| = K} f(\mathcal{K})

f(K)=1λKkKqxk22λmini,jK,ijxixj22f(\mathcal{K}) = \frac{1 - \lambda}{K} \sum_{k \in \mathcal{K}} \|\mathbf{q} - \mathbf{x}_k\|_2^2 - \lambda \min_{i, j \in \mathcal{K},\, i \ne j} \|\mathbf{x}_i - \mathbf{x}_j\|_2^2

The first term rewards query similarity; the second term penalizes pairs that are too close, enforcing diversity through minimum pairwise Euclidean distance.

The filtering proceeds as follows:

  1. Perform ANNS to obtain candidates S\mathcal{S}.
  2. Iteratively, up to KK times:
    • Select and include the current closest remaining candidate.
    • Remove all candidates within ε\varepsilon (using cutoff table) relative to this selection.
    • Stop when KK unique results are found or the candidate set is exhausted (with optional padding safeguards).

For all returned pairs (i,j)(i, j), the guarantee xixj22ε\|\mathbf{x}_i - \mathbf{x}_j\|_2^2 \geq \varepsilon holds, securing diversity with a strict lower bound on mutual proximity.

Computational complexity is O(T+S+KL)\mathcal{O}(T + S + KL), where TT is the ANNS cost, SS the number of candidates, KK the diversified output size, and LL the average cutoff table length, with LNL \ll N.

Hyperparameter ε\varepsilon can be optimized on training data using a bracketing search to minimize the main objective, formalized as:

ε=arg minεEqQtrain[f(ε,q)]\varepsilon^* = \argmin_{\varepsilon} \mathbb{E}_{\mathbf{q} \in \mathcal{Q}_\mathrm{train}}[f^*(\varepsilon, \mathbf{q})]

6. Software Implementation and Availability

The LotusFilter implementation is released under open source terms at https://github.com/matsui528/lotf. Core functionality is in C++17, with accessible Python bindings via nanobind. The repository includes ready-to-use scripts and Jupyter notebooks for every critical step: cutoff table construction, running the filtering process, and hyperparameter training. Explicit usage examples are provided for OpenAI, MS MARCO, and image datasets.

Aspect LotusFilter Characteristics
Purpose Fast, post-processing module to diversify ANNS results
Input Candidate IDs from any ANNS + precomputed cutoff table
Mechanism Remove candidates too similar to already selected, by cutoff lookups
Speed Only 0.02 ms/query for filtering, total ~1 ms/query
Memory Predictable, efficient: needs only cutoff table in addition to index
RAG Integrates trivially; tested with OpenAI embeddings and real text data
Guarantee Ensures min distance ε\sqrt{\varepsilon} among output vectors
Code https://github.com/matsui528/lotf

7. Implications for Large-Scale Retrieval

LotusFilter enables efficient, scalable diversification of retrieval results across a range of vector-search applications, including text and image domains. Its post-processing design, minimal storage and computational overhead, and independence from the underlying retrieval, embedding, or indexing methods make it adaptable for contemporary systems where both relevance and diversity are requirements. In practical RAG deployments, LotusFilter ensures that retrieved evidence or context is both representative and non-redundant, facilitating downstream LLM reasoning without loss of recall. The strict theoretical guarantees and empirical speed position it as a robust alternative to conventional cluster- or mixture-based post-filters for high-dimensional ANNS-based workflows.