Max-Min Diversity (DivPrune): Methods & Applications

Updated 6 January 2026

Max-Min Diversity (DivPrune) is a method for selecting highly diverse subsets by maximizing the minimum pairwise distance among elements, ensuring robust representation in both visual and combinatorial contexts.
A greedy algorithm iteratively adds the element with the highest minimum distance to the current subset, effectively reducing redundancy and achieving significant pruning rates with minimal accuracy loss.
The approach leverages kernelization and fixed-parameter tractable strategies, enabling efficient optimization and extending its applicability to tasks such as visual token pruning and combinatorial clustering.

Max-Min Diversity (DivPrune) refers to a class of algorithmic and optimization approaches for selecting highly diverse subsets from a larger collection, with core applications ranging from @@@@1@@@@ in Large Multimodal Models (LMMs) to combinatorial diversification and clustering in discrete domains. The defining criterion is to maximize the minimum pairwise distance—measured either by geometric metrics among embeddings or set-based metrics such as Hamming distance—among selected elements. This principle yields robustly representative, minimally redundant subset selections, critical for inference efficiency, memory reduction, and modeling accuracy across diverse fields.

1. Formalization as Max-Min Diversity Problems

Let $V = \{v_1, \dots, v_M\}$ denote the set of $M$ candidate objects (e.g., visual token embeddings in $\mathbb{R}^d$ , or subsets of a ground set $U$ ). For a target subset size $\tilde M < M$ (or $k$ in discrete settings), the general max-min diversity problem aims to find $S^* \subseteq V$ , $|S^*| = \tilde M$ , such that the minimum pairwise distance within $S^*$ is maximized:

$S^* = \arg\max_{\substack{S \subseteq V \ |S| = \tilde M}} \min_{\substack{v_i, v_j \in S \ i \neq j}} d(v_i, v_j)$

The choice of $d(\cdot, \cdot)$ varies by context:

For visual embeddings, $d(v_i, v_j) = 1 - \frac{v_i^\top v_j}{\|v_i\| \|v_j\|}$ (cosine distance) is standard (Alvar et al., 4 Mar 2025).
For sets $S, T \subseteq U$ , $d(S, T) = \Delta(S, T) = |S \triangle T|$ (Hamming distance) is used (Kumabe, 2024).

This problem is combinatorially hard (NP-hard), but admits efficient approximations and fixed-parameter tractable (FPT) solutions in structured domains.

2. Greedy Max–Min Algorithms for Visual Token Pruning

In the context of large multimodal models, DivPrune reformulates the pruning of visual tokens as a max-min diversity instance, enforcing maximal feature spread in the retained set. The operational workflow comprises:

Compute the $M \times M$ pairwise distance matrix $\mathbf{D}_{ij} = d(v_i, v_j)$ .
Iteratively construct $S$ $S$ via a greedy selection:
- Seed: Choose the token most distant from nearest neighbor in $V$ .
- Grow: At each step, add the candidate whose nearest distance to $S$ is maximized.
Retain the selected $\tilde M$ tokens and prune the rest.

The algorithm runs in $O(d M^2 + \tilde M M)$ time; for typical $M\approx 576$ , latency is negligible relative to LLM inference (Alvar et al., 4 Mar 2025).

This strategy avoids clusters of near-duplicate embeddings, resulting in a diverse set that collectively preserves more information even under aggressive pruning (up to 86%)—a property not achieved by attention-based or random token removal.

3. Max-Distance Sparsification and FPT Algorithms in Combinatorial Domains

The max-min diversity paradigm generalizes to combinatorial set systems $\mathcal{D}\subseteq 2^U$ via the concept of a max-distance sparsifier (Kumabe, 2024).

Given $k$ and Hamming distance threshold $d$ , the goal is to select $S_1, \dots, S_k \in \mathcal{D}$ such that $\Delta(S_i, S_j) \ge d$ for all $i \neq j$ .

A $k$ -max-distance sparsifier $\mathcal{D}' \subseteq \mathcal{D}$ is a subfamily such that feasible solutions in $\mathcal{D}$ are preserved in $\mathcal{D}'$ , allowing the reduction of the search space to a kernel of size $p(k, \ell) = \ell! (kd+1)^\ell$ where $\ell$ bounds each $|D|\leq \ell$ .

Construction Approaches:

Parameterization by $k+\ell$ : Use Exact Empty Extension Oracles and sunflower-based pruning, yielding $O^*(f(k, \ell))$ -time algorithms.
Parameterization by $k+d$ : Cluster into at most $k$ Hamming-balls or locate $(k+1)$ -scattered families; employ Approximate Far Set Oracles and local reductions, with total complexity $O^*(g(k, d))$ .

These methods instantiate the core DivPrune philosophy: kernelize the search problem, then brute-force over the compacted kernel.

4. Integration and Applications

Multimodal Models

DivPrune can be inserted into any LMM pipeline without fine-tuning:

Compute vision embeddings $\{v_i\}_{i=1}^M$ .
Perform greedy max-min selection for $\tilde M$ tokens.
Prune the remaining tokens.
Concatenate the retained visual tokens with text tokens and input to the LLM (Alvar et al., 4 Mar 2025).

This plug-and-play nature preserves inference accuracy while drastically reducing end-to-end latency and GPU memory requirements. Large-scale benchmark experiments demonstrate that DivPrune achieves minimal performance degradation at high pruning rates, outperforming attention-importance and random pruning baselines.

Combinatorial Optimization

The max-min sparsification framework is applicable to:

$t$ -linear matroid intersection
Almost-2-SAT
Minimum-edge $s,t$ -flows
Vertex sets of min $s,t$ -cuts
Edge bipartization
Steiner trees

Each domain requires a domain-specific oracle (exact extension, $(-1,1)$ -optimization, DP), but the underlying kernelization yields FPT algorithms in parameters $(k, \ell)$ or $(k, d)$ , often achieving the first such results for those domains (Kumabe, 2024).

5. Experimental and Theoretical Insights

Setting	Pruning Ratio	Accuracy Drop	Latency Reduction	State-of-Art Comparison
LLaVA 1.5-7B (image)	84%	–12.7% CIDEr, –12% OKVQA	N/A	Outperforms FastV/VTW by >80% retention
LLaVA-NeXT-Video-7B (video)	86%	–2% ActivityNet, –1.7% SeedBench	–22% latency	+8–19% absolute over FastV/VTW

DivPrune maintains over 80% of original accuracy at TFLOP ratios as low as 10%, while competing methods collapse. Ablation studies confirm that maximizing minimum distance, not randomization or minimizing maximum distance, achieves optimal trade-offs. Cosine, $\ell_1$ , and $\ell_2$ metrics perform equivalently (within 0.4%) (Alvar et al., 4 Mar 2025).

Theoretical results show that sunflower lemma-based sparsification and clustering reductions yield tower-type FPT complexities, but enable tractability in previously hard domains (Kumabe, 2024).

The max-min diversity framework admits numerous extensions:

Max-sum diversification: Maximizing the sum of pairwise distances, solved by reductions to $d$ -limited $(k-1)$ -sparsifiers (Kumabe, 2024).
$k$ -center and $k$ -sum-of-radii clustering: Kernelization and partition-based algorithms for clustering tasks using max-distance sparsifiers.
Layer-wise integration: Pruning at LLM input (Layer 0) is most effective.
Alternative objectives: "Min–Max" and random selection strategies are empirically and theoretically inferior, incurring 15–30% performance drops (Alvar et al., 4 Mar 2025).

The concept of constant-size max-distance sparsification underlies efficient algorithms for both diversification and clustering, facilitating significant advances in both practical model optimization and theoretical parameterized complexity.

7. Significance and Impact

Max-min diversity (DivPrune) formulations unify a spectrum of selection and sparsification problems under a principled, distance-maximizing framework. In LMM inference, DivPrune delivers state-of-the-art accuracy-efficiency trade-offs with zero fine-tuning, generalizing seamlessly across image and video modalities. In combinatorial optimization, the max-distance sparsifier mechanism enables tractable exact algorithms for diversification and clustering across broad domains, providing both practical tools and foundational complexity-theoretic advances (Alvar et al., 4 Mar 2025, Kumabe, 2024).

PDF Markdown Chat (Pro)

References (2)

DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models (2025)

Max-Distance Sparsification for Diversification and Clustering (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Max-Min Diversity (DivPrune).