Vendi Entropy in Diversity Quantification

Updated 17 April 2026

Vendi entropy is a diversity measure that uses pairwise kernel similarities and eigenvalue decomposition to quantify the effective number of distinct modes in a dataset.
It generalizes classical metrics like Shannon and von Neumann entropy through a tunable Hill/Rényi order parameter that adjusts sensitivity to data distribution.
Its applications range from improving generative model diversity in machine learning to evaluating ecological, genomic, and information-theoretic datasets using both exact and approximate computation methods.

Vendi entropy, often referred to as the Vendi score in its exponentiated form, is a similarity-sensitive, kernel-based generalization of Shannon entropy designed to quantify diversity in finite sets of objects. Unlike classical entropy metrics, which consider only occurrence frequencies, Vendi entropy incorporates pairwise similarities among items via positive-semidefinite kernels, thus measuring diversity in a manner that interpolates between counting distinct elements and grouping similar ones. It is mathematically grounded in quantum statistical mechanics, extending von Neumann entropy to general metric and data domains, and admits a family of generalizations controlled by a Hill/Rényi order parameter. Vendi entropy has found utility across disciplines ranging from machine learning, ecology, and genomics to information theory, notably as a reference-free metric for evaluating diversity in generative models, experiment design, and dataset analysis.

1. Definition and Mathematical Formalism

Consider a finite set $\mathcal{X} = \{x_1, ..., x_n\}$ and a symmetric, positive semidefinite similarity kernel $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ , normalized such that $k(x,x)=1$ for all $x$ . The corresponding $n \times n$ Gram (similarity) matrix $K$ is given by $K_{ij} = k(x_i, x_j)$ , and its normalized version, termed the density matrix, is $\rho = \frac{K}{\operatorname{Tr}(K)}$ ; often, if $k(x,x)=1$ , $\operatorname{Tr}(K)=n$ and $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 0.

Let $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 1 denote the eigenvalues of $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 2, which are nonnegative and sum to $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 3. The Vendi entropy of order $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 4 is defined as

$k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 5

The associated Vendi score (effective number) is the exponential (or corresponding Hill-type transformation) of the entropy: $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 6 This formalism recovers von Neumann entropy when $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 7 is the density matrix of a quantum system, and classical Shannon entropy (and associated Hill numbers) when $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 8 is diagonal or block-diagonal according to crisp species (Pasarkar et al., 2023, Nguyen et al., 13 May 2025, Nielsen et al., 26 Sep 2025, Nguyen et al., 5 Nov 2025).

2. Interpretation and Theoretical Properties

The eigenvalues $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{\geq 0}$ 9 of $k(x,x)=1$ 0 represent the weights of orthogonal “principal modes” in the data, much like energy levels in quantum statistical mechanics. Vendi entropy therefore measures how "spread out" the data are across these modes, quantifying the effective number of orthogonal directions (“ur-elements”, Editor's term) actually contributing to the diversity of the sample set (Nguyen et al., 5 Nov 2025).

Key theoretical properties include:

Bounds: $k(x,x)=1$ 1 and $k(x,x)=1$ 2.
Extremal cases: $k(x,x)=1$ 3 if all items are maximally similar; $k(x,x)=1$ 4 if all items are orthogonal (i.e., perfectly dissimilar).
Sensitivity: The order parameter $k(x,x)=1$ 5 tunes sensitivity to abundance/skew. $k(x,x)=1$ 6 emphasizes rare or outlying items (small eigenvalues); $k(x,x)=1$ 7 emphasizes dominant clusters (large eigenvalues).
Additivity: Under independence, $k(x,x)=1$ 8 adds over subsystems.
Reduction: For a kernel that partitions data into $k(x,x)=1$ 9 disjoint species of abundances $x$ 0, the nonzero eigenvalues of $x$ 1 are precisely $x$ 2, so $x$ 3 gives (generalized) Shannon entropy (Pasarkar et al., 2023).
Permutation invariance: $x$ 4 depends only on pairwise similarities, not data order (Friedman et al., 2022).

3. Construction, Computation, and Approximation

The step-by-step workflow for computing Vendi entropy is as follows (Friedman et al., 2022, Pasarkar et al., 2023, Ospanov et al., 2024):

Similarity Kernel: Choose/define a symmetric PSD kernel $x$ 5 reflecting meaningful similarity in the domain; normalize so $x$ 6.
Gram Matrix: Build $x$ 7 with entries $x$ 8.
Density Matrix: Normalize: $x$ 9 (if $n \times n$ 0).
Eigendecomposition: Compute eigenvalues $n \times n$ 1 of $n \times n$ 2.
Normalization: By construction, $n \times n$ 3.
Evaluation: Compute $n \times n$ 4 and/or $n \times n$ 5 by the above formulas.

Computational cost is dominated by the $n \times n$ 6 eigendecomposition. For large $n \times n$ 7, the spectrum may be approximated via:

Nyström approximation: Subsampling columns/rows of $n \times n$ 8.
Random Fourier Features or FKEA: Feature embedding for shift-invariant kernels.
Truncation: For high-dimensional kernels, the $n \times n$ 9-truncated Vendi score provides finite-sample convergence guarantees with $K$ 0 samples (Ospanov et al., 2024).

Selection of kernel bandwidth (e.g., for RBF kernels) crucially affects sensitivity and interpretation; the scale should be chosen such that VS is nontrivial (not $K$ 1 nor $K$ 2) (Nguyen et al., 5 Nov 2025, Ospanov et al., 2024).

4. Connections to Other Diversity and Entropy Measures

Vendi entropy generalizes classical diversity metrics:

Shannon Entropy and Hill Numbers: When $K$ 3 encodes complete dissimilarity between distinct items (i.e., $K$ 4), the Vendi entropy reduces to Shannon entropy and $K$ 5 to the Hill numbers of order $K$ 6.
Von Neumann Entropy: For quantum systems where $K$ 7 is a density matrix, the $K$ 8 case coincides with von Neumann entropy.
Relationship to LCR Entropy: The Leinster–Cobbold–Reeve (LCR) entropy is another similarity-sensitive diversity metric. For uniform probabilities and PSD $K$ 9, empirical and analytic findings suggest $K_{ij} = k(x_i, x_j)$ 0 for all $K_{ij} = k(x_i, x_j)$ 1 and commonly used similarity matrices. VS emphasizes effective mixture over similarity-orthogonal “modes,” while LCR focuses on typicality or “ordinariness” of elements (Nguyen et al., 5 Nov 2025).
Rényi and Tsallis Generalizations: Both forms can be applied to the spectrum of $K_{ij} = k(x_i, x_j)$ 2, yielding the full family of entropy-based diversity indices (Pasarkar et al., 2023, Jalali et al., 2024).

5. Applications Across Disciplines

Vendi entropy is broadly used for quantifying diversity in data-driven settings:

Machine Learning: Detects mode collapse in GANs, characterizes sample diversity, and enables diversity-regularized training in generative models. Vendi-based regularization can improve sample diversity in diffusion and GAN setups (Friedman et al., 2022, Farnia et al., 16 Feb 2026).
Experimental Design and Active Search: Quality-weighted Vendi scores allow balancing exploration (diversity) and exploitation (quality) in Bayesian optimization and active data acquisition policies, increasing effective discoveries (Nguyen et al., 2024, Nguyen et al., 13 May 2025).
Genomic Epidemiology: Classification-independent quantification of viral population diversity in time-resolved sequence datasets and variant detection; tuning $K_{ij} = k(x_i, x_j)$ 3 reveals different aspects of clade emergence and variant sweeps (Nielsen et al., 26 Sep 2025).
Information Theory: Vendi Information Gain (VIG) is a similarity-sensitive, sample-based alternative to mutual information, overcoming the symmetry and tractability limitations of classical MI, especially when explicit probability distributions are unavailable (Nguyen et al., 13 May 2025, Jalali et al., 2024).
Conditional Evaluation: The Conditional Vendi score and Information-Vendi score decompose total diversity into model-induced and prompt-aligned components in prompt-based generative models, supporting nuanced evaluation of both conditional and unconditional diversity (Jalali et al., 2024).

6. Practical Considerations, Limitations, and Guidelines

Kernel Choice and Scale:

The user must select or tune $K_{ij} = k(x_i, x_j)$ 4 to match domain-relevant similarity (e.g., RBF for continuous features, cosine for embeddings, Tanimoto for molecular fingerprints, Hamming for sequences).
Half-distance scaling in kernels (e.g., $K_{ij} = k(x_i, x_j)$ 5) is critical; scan over $K_{ij} = k(x_i, x_j)$ 6 to ensure VS lies in a meaningful range (Nguyen et al., 5 Nov 2025).

Computational Issues:

Direct computation is $K_{ij} = k(x_i, x_j)$ 7. For $K_{ij} = k(x_i, x_j)$ 8, use Nyström or random features for rapid spectrum approximation (Ospanov et al., 2024, Pasarkar et al., 2023).
Truncated Vendi scores are recommended for high-dimensional data or infinite-dimensional kernels; sample complexity is governed by the effective rank $K_{ij} = k(x_i, x_j)$ 9 (Ospanov et al., 2024).

Convergence and Bias:

The standard Vendi score may fail to converge to its infinite-sample limit under infinite-dimensional kernels, motivating the use of truncated or approximate approaches for finite-sample studies (Ospanov et al., 2024).
Sampling bias: as in classical entropy estimation, plug-in estimates for finite samples typically underestimate true population diversity, resulting in "downward diversity bias" when assessing the output diversity of generative models (Farnia et al., 16 Feb 2026).

Interpretation:

VS provides an “effective number of orthogonal modes” (principal directions) in the data, not simply a count of unique items (Nguyen et al., 5 Nov 2025).
For block structure or taxonomic partitions, VS is sensitive to both the number and similarity of clusters, unlike classical counts.

Kernel Specification	Sensitivity and Range	Use Case
Identity (diagonal)	VS = $\rho = \frac{K}{\operatorname{Tr}(K)}$ 0 (max; all distinct)	Crisp categories
Constant-one	VS = $\rho = \frac{K}{\operatorname{Tr}(K)}$ 1 (min; all similar)	Collapsed items
RBF/Exponential (half-distance)	Interpolates [1, $\rho = \frac{K}{\operatorname{Tr}(K)}$ 2]	Embeddings, metric data
Cosine	Embeddings, text/images	Perceptual similarity

7. Comparative and Domain-Specific Guidance

Vendi entropy is generally suited for applications where pairwise similarity between samples encodes domain-relevant structure or when one works directly with feature spaces or kernels. It is preferred to LCR or other metrics when eigenspectrum-based analysis (principal components, quantum mechanical analogies) is meaningful, and when the similarity matrix is guaranteed to be PSD. LCR entropy, by contrast, may be used for non-PSD similarities or when “ordinariness”-based interpretations are desired.

Empirical studies indicate that LCR and VS may diverge substantially except in extreme parameter regimes, suggesting that both may be computed in parallel for a full understanding of diversity structure (Nguyen et al., 5 Nov 2025).

A plausible implication is that, in applications where the diversity of underlying mechanisms (orthogonal modes) matters more than typical pairwise similarity, VS provides the relevant quantification; whereas, for questions of ecological ordinariness or classical diversity, LCR or related measures may be more pertinent.

In summary, Vendi entropy provides a unified, reference-free, similarity-sensitive framework for diversity quantification, bridging ecology, physics, information theory, and machine learning, and offers robust extensions and practical adaptations for large-scale and kernel-based scientific data analysis (Pasarkar et al., 2023, Friedman et al., 2022, Farnia et al., 16 Feb 2026, Ospanov et al., 2024, Nguyen et al., 5 Nov 2025, Nguyen et al., 13 May 2025, Jalali et al., 2024, Nielsen et al., 26 Sep 2025, Nguyen et al., 2024).