DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning

Published 28 Sep 2025 in cs.CL and cs.IR | (2509.23883v1)

Abstract: Visual Document Retrieval (VDR), the task of retrieving visually-rich document pages using queries that combine visual and textual cues, is crucial for numerous real-world applications. Recent state-of-the-art methods leverage Large Vision-LLMs (LVLMs) in a multi-vector paradigm, representing each document as patch-level embeddings to capture fine-grained details. While highly effective, this approach introduces a critical challenge: prohibitive storage overhead, as storing hundreds of vectors per page makes large-scale deployment costly and impractical. To address this, we introduce DocPruner, the first framework to employ adaptive patch-level embedding pruning for VDR to effectively reduce the storage overhead. DocPruner leverages the intra-document patch attention distribution to dynamically identify and discard redundant embeddings for each document. This adaptive mechanism enables a significant 50-60% reduction in storage for leading multi-vector VDR models with negligible degradation in document retrieval performance. Extensive experiments across more than ten representative datasets validate that DocPruner offers a robust, flexible, and effective solution for building storage-efficient, large-scale VDR systems.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces a novel adaptive patch-level embedding pruning method to reduce storage overhead by dynamically discarding redundant embeddings.
The paper employs intra-document patch attention distribution and adaptive thresholding based on statistical properties to maintain retrieval performance.
The paper demonstrates a consistent 50-60% reduction in storage requirements across over ten datasets without significant accuracy loss.

"DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning"

Introduction

The paper "DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning" (2509.23883) presents a novel approach to Visual Document Retrieval (VDR) that addresses the storage overhead associated with current multi-vector VDR systems. VDR is crucial for applications ranging from e-commerce searches to academic resource retrieval. Traditional methods using OCR face limitations in preserving layout integrity and visual elements, leading to a shift towards Large Vision-LLMs (LVLMs) that represent documents as patch-level embeddings. Despite their effectiveness, these models introduce significant storage demands due to the multiple vectors required for each document page.

Proposed Framework

DocPruner introduces adaptive patch-level embedding pruning to mitigate the storage costs in VDR systems. This framework utilizes intra-document patch attention distribution to dynamically identify and discard redundant embeddings, thus maintaining retrieval performance while reducing storage requirements by 50-60%. This approach is significant in that it employs attention mechanisms to quantify patch importance, setting an adaptive threshold for pruning that adjusts based on the statistical properties of each document's embedding attention scores.

Figure 1: The illustration of comparison between OCR-based (a) LVLM-based (b) paradigms for VDR, and (c), a novel framework to adaptively prune the patch-level embeddings for diverse document types.

Methodology

The methodology involves first representing both query and documents as sets of embeddings, with query tokens transformed into token-level embeddings and document pages into patch-level embeddings. The relevance between a query and a document is then calculated using a MaxSim operation across these embeddings. DocPruner enhances this process by computing patch importance based on attention directed towards a global token, typically the [EOS] token. This is followed by an adaptive thresholding process informed by the mean and variance of these importance scores, allowing the model to dynamically adjust the pruning based on document characteristics.

Experimental Results

Extensive experimentation across over ten datasets demonstrates that DocPruner achieves substantial storage efficiency without significant performance degradation. The results indicate a robust, flexible, and effective approach for large-scale VDR systems, crucial for enabling the deployment in practical applications. Figures illustrate performance comparisons and pruning ratio distributions, highlighting the framework's ability to maintain high retrieval performance across diverse datasets while achieving significant storage compression.

Figure 2: Performance comparison (nDCG@5) between and baselines on ViDoRe-V2 benchmark.

Implications and Future Directions

DocPruner's contributions are multifaceted: pioneering adaptive pruning in VDR, providing a tailored approach for diverse document types, and validating its method through comprehensive benchmarks. The framework opens avenues for more efficient multimodal retrieval systems, reducing operational costs and expanding accessibility. Future work may integrate pruning mechanisms directly into model training or explore adaptive principles for other modalities to enhance real-world applications. This framework represents a step towards making high-performance VDR systems practical and scalable for various industries and sectors, potentially reshaping how document retrieval technologies are deployed at scale.

Conclusion

DocPruner addresses the critical storage overhead challenge in multi-vector VDR systems by introducing adaptive patch-level embedding pruning. This framework effectively reduces storage demands while preserving retrieval accuracy, leveraging attention mechanisms to dynamically prune document embeddings based on importance scores. Extensive testing across diverse datasets validates its efficacy, positioning DocPruner as a pivotal development for storage-efficient VDR systems. As the field of document retrieval continues to evolve, DocPruner paves the way for practical deployments of fine-grained, multimodal understanding technologies at unprecedented scales.

Markdown Report Issue