Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 225 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Vendi Score (VS): Flexible Diversity Metric

Updated 29 September 2025
  • Vendi Score (VS) is a reference-free diversity metric defined as the exponential of the entropy of the eigenvalue spectrum of a similarity matrix.
  • It leverages a user-defined similarity function to adaptively measure diversity across various domains like images, text, molecules, and ecological data.
  • Its differentiable formulation supports integration into optimization pipelines, aiding in quality control and detecting issues like mode collapse in generative models.

The Vendi Score (VS) is a flexible, reference-free diversity metric that quantifies the effective number of distinct items in a sample, generalizing classical diversity indices by explicitly leveraging pairwise similarities. Developed in the context of machine learning but extensively applicable to fields ranging from ecology to genomics, VS is constructed as the exponential of the Shannon entropy (or its Rényi generalizations) of the eigenvalue spectrum of a similarity (kernel) matrix. Unlike domain-specific metrics, VS is notable for its reliance on a user-defined similarity function, enabling practitioners to target granular notions of diversity relevant to heterogeneous data domains, without requiring a reference dataset or label distributions.

1. Mathematical Formulation and Theoretical Foundations

The core definition of the Vendi Score is as follows. Given NN objects x1,,xNx_1, \ldots, x_N and a positive semidefinite similarity kernel k(,)k(\cdot, \cdot) satisfying k(x,x)=1k(x, x) = 1, the similarity matrix KRN×NK \in \mathbb{R}^{N \times N} is formed as Kij=k(xi,xj)K_{ij} = k(x_i, x_j). Normalizing by NN, i.e., K/NK/N, yields a matrix whose eigenvalues λ1,,λN\lambda_1, \ldots, \lambda_N sum to one. The Vendi Score is defined as

VSk(x1,,xN)=exp(i=1Nλilogλi).\text{VS}_k(x_1, \ldots, x_N) = \exp\left(-\sum_{i=1}^{N}\lambda_i \log \lambda_i \right).

This formulation is recognized as the exponential of the von Neumann (Shannon matrix) entropy, which gives the VS a concrete interpretation as the “effective number” of maximally dissimilar items.

The VS can be generalized using the Rényi entropy of order qq,

VSq(K,k)=exp(11qlogiλiq),\text{VS}_q(K, k) = \exp\left(\frac{1}{1-q} \log \sum_{i} \lambda_i^q\right),

where q<1q < 1 increases sensitivity to rare types, while q>1q > 1 enhances sensitivity to abundant or duplicated types (Pasarkar et al., 2023). When kk is block-diagonal and all similarity between different blocks is $0$, nonzero eigenvalues recover the normalized prevalence vector, reproducing classical Hill numbers. This generalization unifies abundance weighting and similarity-induced diversity within a single, differentiable, eigenvalue-based framework.

2. Specification and Role of the Similarity Function

The similarity kernel k(,)k(\cdot, \cdot) is central to the adaptability of VS. It may be chosen to suit the target domain:

  • Images: k(x,y)k(x, y) as cosine similarity of deep feature embeddings (e.g., Inception), or as raw pixel cosine similarity.
  • Text: k(x,y)k(x, y) as n-gram overlap across bag-of-n-grams vectors, or semantic similarity via sentence embeddings (e.g., SimCSE).
  • Molecules: k(x,y)k(x, y) as fingerprint similarity (e.g., Morgan fingerprints).

The only constraint is k(x,x)=1k(x,x)=1 (self-similarity) and positive semidefiniteness for stability. VS thus enables users to “tune” the diversity criterion—structural, semantic, functional—by varying kk. The choice of kernel influences both sensitivity and interpretability, and may be adapted for domain-specific purposes such as molecular structure comparison, image visual diversity, or transcriptomic variation (Friedman et al., 2022).

3. Interpretability and Effective Number Principle

A key virtue of VS is interpretability. The VS value can be read as the “effective number” of distinct objects, meaning the diversity in the set is equivalent to having VS-many mutually dissimilar items. This property matches the interpretive advantages of Hill numbers but does not presuppose prior labels or categories, class assignments, or reference distributions. VS is therefore suitable for diagnosing internal sample diversity, inspecting data for redundancy, and guiding augmentation procedures. For instance, a molecular generative model with a VS of 250.9 (compared to a possible set size of thousands) indicates a high degree of duplicated or nearly duplicated structures, even when pairwise diversity scores are high by conventional metrics (Friedman et al., 2022).

4. Comparison with Traditional Diversity Metrics

Standard diversity metrics in machine learning and ecology include:

Metric Underlying Principle Limitations (per VS framework)
Mean Pairwise Dissimilarity (IntDiv) Average distance between items Misses correlation structure, insensitive to clusters
Reference-Based Diversity Comparison against held-out benchmark set Not internal, depends on external labels
N-gram Diversity (text) Repetitions detected via local statistics Sensitive to sentence length, does not fully capture overall semantic variety
Hill Numbers Abundance-weighted category counts Requires explicit classification, ignores continuous similarity

VS differs by incorporating the full correlation structure across all samples. It computes the entropy over eigenvalues, which encodes the “soft-rank” or effective dimension of the sample set. VS is sensitive to subtle duplication and clustering—important in detecting phenomena like mode collapse in GANs (where generators replicate modes but not internal variance), which may be undetectable by simple class-count measures (Friedman et al., 2022). When q=q = \infty, VS reduces to reciprocal of the maximum eigenvalue (fully sensitive to dominant clusters), while q<1q < 1 places greater weight on rare entities. This flexibility is mathematically rigorous and affords diagnostics unavailable in classical metrics (Pasarkar et al., 2023).

5. Applications Across Scientific Domains

VS has seen practical deployment in a variety of scientific contexts:

  • Molecular Generative Modeling: Applied to HMMs, VAEs, and RNNs for molecular generation, VS revealed near-duplicate bias in high-intdiv models, highlighting the need for similarity-aware diversity estimation.
  • Image Generation: VS used with semantic and pixel-based kernels on datasets (CIFAR-10, ImageNet, LSUN) exposed differences in semantic versus pixel-level diversity across architectures (VDVAE, DenseFlow, StyleGAN, etc.).
  • Text Decoding: VS measured diversity among captions under different decoding strategies, outperforming n-gram diversity in distinguishing true sentence variety from repetition artifacts.
  • Mode Collapse in GANs: VS identified GANs that produce the correct number of modes but suffer from sample duplication within modes, addressing the limits of classifier-based measures of mode coverage.
  • Dataset Diagnosis: VS used for quality control in ML and scientific datasets, identifying redundancy and guiding augmentation strategies.

These results demonstrate VS’s ability to diagnose sample diversity, inform optimization and sampling strategies, and complement or surpass traditional metrics across disciplines (Friedman et al., 2022).

6. Algorithmic and Computational Considerations

Computationally, VS requires eigendecomposition of the normalized similarity matrix. For NN samples, this entails O(N3)O(N^3) complexity. In large-scale applications, this can be mitigated via spectral truncation, randomized approximation, or kernel approximation techniques. Notably, VS is entirely reference-free and requires no access to a held-out dataset or external labels—only a user-specified similarity function and the sample set.

The metric is differentiable and thus suited to integration within optimization pipelines (e.g., reinforcement learning intrinsic rewards, sampling force computation in molecular simulation, diversity-augmented generative model training). The partitioning property of the Shannon entropy inherited by VS also enables weighted formulations and mixture decomposition, supporting advanced theoretical constructs such as clade-level diversity in phylogenetic trees or mode-specific diversity in multimodal generative models (Pasarkar et al., 2023).

7. Limitations and Caveats

While the Vendi Score powerfully generalizes sample diversity estimation, several caveats warrant consideration:

  • VS’s output and sensitivity are entirely determined by the chosen similarity kernel. Poor selection of kk may obscure meaningful diversity or exaggerate irrelevant features.
  • Interpretation is strictly internal to the sample—VS does not measure how “novel” a sample is with respect to an external reference or ground truth.
  • Computational cost (eigenvalue calculation) may be prohibitive for large sample sets, although various spectral approximation techniques (Nyström, random sampling, truncated eigenspectrum) can mitigate.
  • In cases with ambiguous similarities (e.g., multi-modal data with shared modes), VS’s global entropy may average over modes, requiring careful analysis or per-mode computation for fine-grained insight.

A plausible implication is that VS is best employed in conjunction with other metrics, especially when used for downstream model evaluation or in high-dimensional, heavily correlated domains.

Summary

The Vendi Score is a mathematically rigorous, reference-free diversity metric defined as the exponential of the entropy (Shannon or Rényi) of the eigenvalues of a sample similarity matrix. By utilizing a user-defined similarity function, VS flexibly adapts to arbitrary notions of diversity, supporting fine-grained analysis and optimization. Its interpretability as an “effective number” of distinct samples, sensitivity to duplication and clustering, and differentiability make it suitable for generative model evaluation, data curation, ecological paper, and scientific exploration. VS advances diversity estimation beyond traditional metrics, accommodating unsupervised, similarity-weighted, and abundance-sensitive analyses while remaining computationally tractable via spectral methods. This metric underpins a growing body of work in diverse fields and serves as a tool for critical evaluation and enhancement of data-driven scientific and machine learning systems (Friedman et al., 2022, Pasarkar et al., 2023).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Vendi Score (VS).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube