Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conditional & Information Vendi Scores Overview

Updated 17 April 2026
  • Conditional and Information Vendi Scores are advanced measures that generalize classical entropy through kernel-based Rényi entropy to quantify dataset diversity and information loss.
  • They enable fine-grained decomposition of diversity into intrinsic and prompt-induced components, aiding model evaluation and active learning in high-dimensional spaces.
  • Efficient computation via eigendecomposition and scalable kernel approximations makes these scores practical for diagnosing generative models and calibrating probabilistic outputs.

Conditional and Information Vendi Scores are advanced information-theoretic quantities for quantifying diversity, informativeness, and information loss in datasets, generative models, and the outputs of machine learning algorithms. Rooted in matrix-based Rényi entropy and kernel similarity measures, these scores generalize classical mutual information and entropy to the setting of similarity structures and conditional information, with practical and theoretical utility in active learning, model evaluation, and probabilistic calibration. Their development addresses limitations of standard metrics—such as inapplicability to high-dimensional sample spaces, insensitivity to feature similarity, and inability to decompose conditional versus marginal diversity—enabling fine-grained analysis of diversity, alignment, and residual variability across modern machine learning modalities (Nguyen et al., 13 May 2025, Jalali et al., 2024, Nguyen et al., 12 Sep 2025, Charpentier et al., 16 Mar 2026).

1. Foundations: Vendi Score and Vendi Entropy

The Vendi Score is a spectrum-based generalization of classical entropy that measures the diversity of a set D={θ1,,θn}D = \{ \theta_1, \dots, \theta_n \} in a space Θ\Theta equipped with a normalized positive-semidefinite kernel k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}, where k(θ,θ)=1k(\theta,\theta) = 1. The kernel Gram matrix Kij=k(θi,θj)K_{ij} = k(\theta_i, \theta_j) admits eigenvalues λ1,,λn\lambda_1,\ldots,\lambda_n, yielding normalized weights λˉi=λi/tr(K)\bar\lambda_i = \lambda_i / \mathrm{tr}(K).

The order-qq Vendi Score is

VSq(D;k)=(i=1n(λˉi)q)1/(1q)=exp(HV(D;q)),\mathrm{VS}_q(D; k) = \left( \sum_{i=1}^n (\bar{\lambda}_i)^q \right)^{1/(1-q)} = \exp\left( H_V(D; q) \right),

where Vendi entropy HV(D;q)=(1/(1q))log(i(λˉi)q)H_V(D; q) = (1/(1-q)) \log \left( \sum_i (\bar{\lambda}_i)^q \right) is the Rényi entropy of the Gram spectrum (Nguyen et al., 13 May 2025, Jalali et al., 2024). For Θ\Theta0, Θ\Theta1 becomes the Shannon entropy; for Θ\Theta2, it yields the log effective dimension.

This construction enables quantification of sample-set diversity under arbitrary similarity, without explicit densities or probability distributions. When the kernel collapses to the identity, the Vendi Score recovers discrete entropy over empirical distributions.

2. Conditional Vendi Scores: Matrix-based and Sample-partitioned Forms

Matrix-based conditional Vendi entropy enables quantification of internal diversity given auxiliary variables (prompts, classes, modalities). Given sample-prompt pairs Θ\Theta3 and kernels Θ\Theta4 on samples/prompts with respective Gram matrices Θ\Theta5, the joint Gram Θ\Theta6 (Hadamard product) quantifies joint similarity. The conditional Vendi entropy is (Jalali et al., 2024): Θ\Theta7 with conditional Vendi score

Θ\Theta8

Alternatively, in categorical (partitioned) settings, conditional Vendi entropy is the expectation of marginal Vendi entropies over condition values: Θ\Theta9 where k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}0 is the subset of k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}1 with label k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}2 (Nguyen et al., 13 May 2025).

Conditional Vendi scores robustly isolate "model-induced" variability by removing the component of diversity attributable to the conditioning variable (prompt, label, or information set). This makes them essential for disentangling intrinsic generation diversity from prompt-driven diversity in prompt-based generative models (Jalali et al., 2024).

3. Information Vendi Scores and Vendi Information Gain

The Information Vendi Score quantifies the mutual dependence between samples and conditioning variables using spectrum-based kernel entropy: k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}3 with

k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}4

The Vendi Information Gain (VIG) follows this formalism: k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}5 and reduces to classical mutual information when the kernel is the identity (Nguyen et al., 13 May 2025). VIG has the following critical properties:

  • Asymmetry: VIG is not symmetric in its two arguments if distinct kernels are used.
  • Sensitivity to similarity: VIG decreases when conditional subsets become more similar, even if discrete entropies are unchanged.
  • Tractability: Requires only kernel computations and eigendecomposition; avoids density estimation.

In active learning, a model-based version of VIG is used to score candidate batches k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}6 as the expected entropy reduction in the predictive distribution over the unlabeled pool, formally

k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}7

where k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}8 is the posterior over pool labels given data k:Θ×ΘRk : \Theta \times \Theta \rightarrow \mathbb{R}9 (Nguyen et al., 12 Sep 2025).

4. Theoretical Properties and Decomposition

Vendi-based scores admit information-theoretic decompositions paralleling the chain rules for entropy. For kernel entropies,

k(θ,θ)=1k(\theta,\theta) = 10

so

k(θ,θ)=1k(\theta,\theta) = 11

(Jalali et al., 2024).

The theoretical results include:

  • Submodularity: VIG is approximately submodular with respect to batch selection under mild exchangeability, empowering efficient greedy maximization (yielding a k(θ,θ)=1k(\theta,\theta) = 12-regret bound) (Nguyen et al., 12 Sep 2025).
  • Empirical interpretations: The conditional entropy computed from the Gram matrix equals the entropy of a joint empirical kernel covariance operator (Jalali et al., 2024).
  • Cluster-wise consistency: Conditional Vendi entropy approaches the mean per-cluster entropy (controlled by prompt clusters) under well-separated, low-variance clusters (Jalali et al., 2024).

For probabilistic scoring, any proper loss decomposes the expected score into a "conditional Vendi" term (proper-regret at an information level) and an "information Vendi" term (information loss from progressive refinement of conditional structure) plus residual uncertainty (Charpentier et al., 16 Mar 2026).

5. Estimation Algorithms and Computational Considerations

Computation of Vendi, conditional-Vendi, and information-Vendi scores involves:

  • Selecting representation embeddings (e.g., DINOv2 for images, Gemini CLIP for text).
  • Tuning Gaussian kernel bandwidths to ensure stability (variance k(θ,θ)=1k(\theta,\theta) = 13 across runs).
  • Computing Gram matrices, possibly via fast low-rank approximations for large k(θ,θ)=1k(\theta,\theta) = 14.
  • Eigendecomposition of normalized Gram matrices for marginal, conditional, and joint structures.
  • Algorithmic pseudocode for finite-sample estimation is direct and sample-efficient—no repeated generation per prompt required (Jalali et al., 2024).

When applied to model-based settings such as active learning, entropy expectations are estimated using the model's predictive distribution, possibly with independence approximations for batch candidates.

Main computational bottlenecks include cubic complexity in sample number for full eigendecomposition. Scalable approximations (e.g., Nyström, random Fourier features) are proposed open problems (Jalali et al., 2024).

6. Empirical Validation and Applications

Conditional and information Vendi scores find use in:

  • Active learning: VIG provides a globally-aware selection criterion outperforming max-entropy, BALD, and CoreSet for ecological image labeling, including on the Snapshot Serengeti dataset (3.2M images, 10-class task). The greedy-batch VIG algorithm is empirically robust, scalable, and more label-efficient—achieving 5–7% higher accuracy over max-entropy/BALD, 3% over CoreSet with 30% fewer labels, and faster convergence (Nguyen et al., 12 Sep 2025).
  • Text-conditioned generation: Conditional-Vendi isolates model-induced from prompt-induced diversity in text-to-image/video/captioning tasks. Information-Vendi quantifies prompt-sample alignment (statistical relevance). Empirical results confirm that only Conditional-Vendi remains constant when diversity is manipulated solely via prompts; Information-Vendi is prompt-sensitive. Comparisons of modern generative models on COCO/ImageNet-derived prompts confirm verdicts match perceptual diversity and relevance (Jalali et al., 2024).
  • Probabilistic calibration and grouping losses: The information Vendi term quantifies non-recoverable information loss from lossy summary predictions or feature–score compressions, and the conditional Vendi term precisely measures calibration error at a given information level (Charpentier et al., 16 Mar 2026).

In each case, the ability to isolate internal (model-driven) and external (prompt- or condition-driven) diversity, and to separate calibration from information loss, is unattainable with classical entropy or mutual information alone.

7. Extensions, Limitations, and Open Directions

Extensions of Vendi-based scores include:

Limitations stem from computational cost for large k(θ,θ)=1k(\theta,\theta) = 15, embedding-induced biases (score quality is bounded by representational fidelity), and the restrictive assumptions (e.g., cluster separation, kernel normalization) underpinning certain theoretical guarantees (Jalali et al., 2024). In practice, efficient large-scale eigendecomposition remains an area for future development.

A plausible implication is that, as embeddings and similarity measures grow more accurate, matrix-based Vendi decompositions will become increasingly central for diagnosing, regularizing, and optimizing diversity and information flows in complex generative and predictive systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional and Information Vendi Scores.