LotusFilter: Diversifying ANNS Results

Updated 30 June 2025

LotusFilter is a post-processing module that diversifies ANNS outputs by removing redundant near-duplicate vectors for enhanced retrieval.
It employs a precomputed cutoff table to quickly exclude candidates within a defined squared distance threshold, ensuring both query similarity and diversity.
Evaluations show LotusFilter adds only 0.02 ms per query with low memory overhead, making it effective for large-scale, high-dimensional applications.

LotusFilter is a post-processing module designed to diversify the results of approximate nearest neighbor search (ANNS), particularly in high-dimensional vector spaces such as those encountered with modern text and image embeddings. Standard ANNS techniques efficiently retrieve vectors that are closest to a query vector but often yield results that are not only similar to the query but also highly similar to each other, limiting their utility in scenarios—such as image retrieval, recommendation, or retrieval-augmented generation (RAG)—where both relevance and diversity are desired. LotusFilter addresses this shortcoming by providing a fast, deterministic method that eliminates redundant, near-duplicate vectors from ANNS candidate sets, producing outputs that preserve query similarity while ensuring diversity among retrieved items.

1. Post-Processing Diversification in ANNS

LotusFilter serves as a plug-in post-processing step that can be applied after any black-box ANNS retrieval system. Upon receiving an initial set of nearest neighbor candidates, LotusFilter iteratively prunes candidates that are “too close” to each other according to a predefined squared distance threshold $\varepsilon$ . This process yields a subset in which each element remains close to the query while being sufficiently distant from every other element, thereby balancing relevance and diversity. Because the top-nearest neighbor to the query is always preserved, the recall of the most relevant item is guaranteed, and diversity is enforced without requiring adjustments to the ANNS backend or embedding models. Typical query processing time attributable to diversification is 0.02 ms per query for candidate sets of size $S=500$ and desired output size $K=100$ over a dataset of $N=9 \times 10^5$ vectors.

2. Cutoff Table Construction and Role

The cutoff table is a precomputed structure central to the efficient operation of LotusFilter. For each vector $\mathbf{x}_n$ in the database, this table records the IDs of all other vectors $\mathbf{x}_i$ such that $\|\mathbf{x}_n - \mathbf{x}_i\|_2^2 < \varepsilon$ . This captures the local neighborhood of each vector under the prescribed diversity threshold.

Preprocessing involves performing a range search for each database vector to identify its close neighbors. This process is conducted offline and typically requires less than one minute per million vectors, making it suitable for large-scale deployment. At query time, whenever a candidate vector is selected for inclusion in the diversified results, LotusFilter efficiently eliminates all its close (redundant) neighbors from further consideration by referencing the cutoff table. This enables constant-time neighborhood elimination without recomputing pairwise distances at query time, which is critical for high-throughput applications.

3. Performance Metrics and Comparative Evaluation

LotusFilter has been empirically evaluated using large-scale, high-dimensional datasets—such as 1536-dimensional OpenAI embeddings—for both text and image tasks. Its filtering step (“diversification”) adds only 0.02 ms per query, and the complete search-plus-filtering process totals approximately 1 ms per query, making it nearly as fast as conventional ANNS querying alone.

Method	Final Score (lower better)	Total Query Time (ms)	Memory Overhead
Search only	0.200	0.855	–
Clustering	0.223	7.88	High ($32ND$)
GMM	0.177	14.4	High ($32ND$)
LotusFilter	0.171	1.03	Low ( $\sim64LN$ bits)

Unlike methods based on clustering or Gaussian mixture models (GMM), which impose substantial storage cost by requiring the original high-dimensional vectors ($32ND$ memory overhead), LotusFilter’s predictable overhead is lower and does not grow with dimension $D$ but rather with the average cutoff table length $L$ per vector. This makes it suitable for high-dimensional and large-scale vector databases.

4. Practical Integration in Retrieval-Augmented Generation (RAG)

In RAG pipelines, where external text passages or contextual snippets are retrieved to augment LLM outputs, ANNS is standard practice for selecting candidate contexts using embedding similarity. LotusFilter integrates seamlessly as a post-processing layer: following ANNS-based candidate selection, LotusFilter removes near-duplicates, ensuring that the passages provided to the LLM are not only relevant but also diverse. No changes are required to the underlying RAG model, the embedding generation process, or the retrieval index. Experiments in the referenced paper utilize OpenAI’s 1536-dimensional embeddings and demonstrate the removal of near-identical or semantically redundant text among retrieved passages.

5. Algorithmic Formulation and Guarantees

The selection of the $K$ -element diversified subset $\mathcal{K}$ from candidate set $\mathcal{S}$ is formally posed as:

$\argmin_{\mathcal{K} \subseteq \mathcal{S}, |\mathcal{K}| = K} f(\mathcal{K})$

$f(\mathcal{K}) = \frac{1 - \lambda}{K} \sum_{k \in \mathcal{K}} \|\mathbf{q} - \mathbf{x}_k\|_2^2 - \lambda \min_{i, j \in \mathcal{K},\, i \ne j} \|\mathbf{x}_i - \mathbf{x}_j\|_2^2$

The first term rewards query similarity; the second term penalizes pairs that are too close, enforcing diversity through minimum pairwise Euclidean distance.

The filtering proceeds as follows:

Perform ANNS to obtain candidates $\mathcal{S}$ .
Iteratively, up to $K$ $K$ times:
- Select and include the current closest remaining candidate.
- Remove all candidates within $\varepsilon$ (using cutoff table) relative to this selection.
- Stop when $K$ unique results are found or the candidate set is exhausted (with optional padding safeguards).

For all returned pairs $(i, j)$ , the guarantee $\|\mathbf{x}_i - \mathbf{x}_j\|_2^2 \geq \varepsilon$ holds, securing diversity with a strict lower bound on mutual proximity.

Computational complexity is $\mathcal{O}(T + S + KL)$ , where $T$ is the ANNS cost, $S$ the number of candidates, $K$ the diversified output size, and $L$ the average cutoff table length, with $L \ll N$ .

Hyperparameter $\varepsilon$ can be optimized on training data using a bracketing search to minimize the main objective, formalized as:

$\varepsilon^* = \argmin_{\varepsilon} \mathbb{E}_{\mathbf{q} \in \mathcal{Q}_\mathrm{train}}[f^*(\varepsilon, \mathbf{q})]$

6. Software Implementation and Availability

The LotusFilter implementation is released under open source terms at https://github.com/matsui528/lotf. Core functionality is in C++17, with accessible Python bindings via nanobind. The repository includes ready-to-use scripts and Jupyter notebooks for every critical step: cutoff table construction, running the filtering process, and hyperparameter training. Explicit usage examples are provided for OpenAI, MS MARCO, and image datasets.

Aspect	LotusFilter Characteristics
Purpose	Fast, post-processing module to diversify ANNS results
Input	Candidate IDs from any ANNS + precomputed cutoff table
Mechanism	Remove candidates too similar to already selected, by cutoff lookups
Speed	Only 0.02 ms/query for filtering, total ~1 ms/query
Memory	Predictable, efficient: needs only cutoff table in addition to index
RAG	Integrates trivially; tested with OpenAI embeddings and real text data
Guarantee	Ensures min distance $\sqrt{\varepsilon}$ among output vectors
Code	https://github.com/matsui528/lotf

7. Implications for Large-Scale Retrieval

LotusFilter enables efficient, scalable diversification of retrieval results across a range of vector-search applications, including text and image domains. Its post-processing design, minimal storage and computational overhead, and independence from the underlying retrieval, embedding, or indexing methods make it adaptable for contemporary systems where both relevance and diversity are requirements. In practical RAG deployments, LotusFilter ensures that retrieved evidence or context is both representative and non-redundant, facilitating downstream LLM reasoning without loss of recall. The strict theoretical guarantees and empirical speed position it as a robust alternative to conventional cluster- or mixture-based post-filters for high-dimensional ANNS-based workflows.

PDF Markdown Chat (Upgrade)