Conditional & Information Vendi Scores Overview

Updated 17 April 2026

Conditional and Information Vendi Scores are advanced measures that generalize classical entropy through kernel-based Rényi entropy to quantify dataset diversity and information loss.
They enable fine-grained decomposition of diversity into intrinsic and prompt-induced components, aiding model evaluation and active learning in high-dimensional spaces.
Efficient computation via eigendecomposition and scalable kernel approximations makes these scores practical for diagnosing generative models and calibrating probabilistic outputs.

Conditional and Information Vendi Scores are advanced information-theoretic quantities for quantifying diversity, informativeness, and information loss in datasets, generative models, and the outputs of machine learning algorithms. Rooted in matrix-based Rényi entropy and kernel similarity measures, these scores generalize classical mutual information and entropy to the setting of similarity structures and conditional information, with practical and theoretical utility in active learning, model evaluation, and probabilistic calibration. Their development addresses limitations of standard metrics—such as inapplicability to high-dimensional sample spaces, insensitivity to feature similarity, and inability to decompose conditional versus marginal diversity—enabling fine-grained analysis of diversity, alignment, and residual variability across modern machine learning modalities (Nguyen et al., 13 May 2025, Jalali et al., 2024, Nguyen et al., 12 Sep 2025, Charpentier et al., 16 Mar 2026).

1. Foundations: Vendi Score and Vendi Entropy

The Vendi Score is a spectrum-based generalization of classical entropy that measures the diversity of a set $D = \{ \theta_1, \dots, \theta_n \}$ in a space $\Theta$ equipped with a normalized positive-semidefinite kernel $k : \Theta \times \Theta \rightarrow \mathbb{R}$ , where $k(\theta,\theta) = 1$ . The kernel Gram matrix $K_{ij} = k(\theta_i, \theta_j)$ admits eigenvalues $\lambda_1,\ldots,\lambda_n$ , yielding normalized weights $\bar\lambda_i = \lambda_i / \mathrm{tr}(K)$ .

The order- $q$ Vendi Score is

$\mathrm{VS}_q(D; k) = \left( \sum_{i=1}^n (\bar{\lambda}_i)^q \right)^{1/(1-q)} = \exp\left( H_V(D; q) \right),$

where Vendi entropy $H_V(D; q) = (1/(1-q)) \log \left( \sum_i (\bar{\lambda}_i)^q \right)$ is the Rényi entropy of the Gram spectrum (Nguyen et al., 13 May 2025, Jalali et al., 2024). For $\Theta$ 0, $\Theta$ 1 becomes the Shannon entropy; for $\Theta$ 2, it yields the log effective dimension.

This construction enables quantification of sample-set diversity under arbitrary similarity, without explicit densities or probability distributions. When the kernel collapses to the identity, the Vendi Score recovers discrete entropy over empirical distributions.

2. Conditional Vendi Scores: Matrix-based and Sample-partitioned Forms

Matrix-based conditional Vendi entropy enables quantification of internal diversity given auxiliary variables (prompts, classes, modalities). Given sample-prompt pairs $\Theta$ 3 and kernels $\Theta$ 4 on samples/prompts with respective Gram matrices $\Theta$ 5, the joint Gram $\Theta$ 6 (Hadamard product) quantifies joint similarity. The conditional Vendi entropy is (Jalali et al., 2024): $\Theta$ 7 with conditional Vendi score

$\Theta$ 8

Alternatively, in categorical (partitioned) settings, conditional Vendi entropy is the expectation of marginal Vendi entropies over condition values: $\Theta$ 9 where $k : \Theta \times \Theta \rightarrow \mathbb{R}$ 0 is the subset of $k : \Theta \times \Theta \rightarrow \mathbb{R}$ 1 with label $k : \Theta \times \Theta \rightarrow \mathbb{R}$ 2 (Nguyen et al., 13 May 2025).

Conditional Vendi scores robustly isolate "model-induced" variability by removing the component of diversity attributable to the conditioning variable (prompt, label, or information set). This makes them essential for disentangling intrinsic generation diversity from prompt-driven diversity in prompt-based generative models (Jalali et al., 2024).

3. Information Vendi Scores and Vendi Information Gain

The Information Vendi Score quantifies the mutual dependence between samples and conditioning variables using spectrum-based kernel entropy: $k : \Theta \times \Theta \rightarrow \mathbb{R}$ 3 with

$k : \Theta \times \Theta \rightarrow \mathbb{R}$ 4

The Vendi Information Gain (VIG) follows this formalism: $k : \Theta \times \Theta \rightarrow \mathbb{R}$ 5 and reduces to classical mutual information when the kernel is the identity (Nguyen et al., 13 May 2025). VIG has the following critical properties:

Asymmetry: VIG is not symmetric in its two arguments if distinct kernels are used.
Sensitivity to similarity: VIG decreases when conditional subsets become more similar, even if discrete entropies are unchanged.
Tractability: Requires only kernel computations and eigendecomposition; avoids density estimation.

In active learning, a model-based version of VIG is used to score candidate batches $k : \Theta \times \Theta \rightarrow \mathbb{R}$ 6 as the expected entropy reduction in the predictive distribution over the unlabeled pool, formally

$k : \Theta \times \Theta \rightarrow \mathbb{R}$ 7

where $k : \Theta \times \Theta \rightarrow \mathbb{R}$ 8 is the posterior over pool labels given data $k : \Theta \times \Theta \rightarrow \mathbb{R}$ 9 (Nguyen et al., 12 Sep 2025).

4. Theoretical Properties and Decomposition

Vendi-based scores admit information-theoretic decompositions paralleling the chain rules for entropy. For kernel entropies,

$k(\theta,\theta) = 1$ 0

$k(\theta,\theta) = 1$ 1

(Jalali et al., 2024).

The theoretical results include:

Submodularity: VIG is approximately submodular with respect to batch selection under mild exchangeability, empowering efficient greedy maximization (yielding a $k(\theta,\theta) = 1$ 2-regret bound) (Nguyen et al., 12 Sep 2025).
Empirical interpretations: The conditional entropy computed from the Gram matrix equals the entropy of a joint empirical kernel covariance operator (Jalali et al., 2024).
Cluster-wise consistency: Conditional Vendi entropy approaches the mean per-cluster entropy (controlled by prompt clusters) under well-separated, low-variance clusters (Jalali et al., 2024).

For probabilistic scoring, any proper loss decomposes the expected score into a "conditional Vendi" term (proper-regret at an information level) and an "information Vendi" term (information loss from progressive refinement of conditional structure) plus residual uncertainty (Charpentier et al., 16 Mar 2026).

5. Estimation Algorithms and Computational Considerations

Computation of Vendi, conditional-Vendi, and information-Vendi scores involves:

Selecting representation embeddings (e.g., DINOv2 for images, Gemini CLIP for text).
Tuning Gaussian kernel bandwidths to ensure stability (variance $k(\theta,\theta) = 1$ 3 across runs).
Computing Gram matrices, possibly via fast low-rank approximations for large $k(\theta,\theta) = 1$ 4.
Eigendecomposition of normalized Gram matrices for marginal, conditional, and joint structures.
Algorithmic pseudocode for finite-sample estimation is direct and sample-efficient—no repeated generation per prompt required (Jalali et al., 2024).

When applied to model-based settings such as active learning, entropy expectations are estimated using the model's predictive distribution, possibly with independence approximations for batch candidates.

Main computational bottlenecks include cubic complexity in sample number for full eigendecomposition. Scalable approximations (e.g., Nyström, random Fourier features) are proposed open problems (Jalali et al., 2024).

6. Empirical Validation and Applications

Conditional and information Vendi scores find use in:

Active learning: VIG provides a globally-aware selection criterion outperforming max-entropy, BALD, and CoreSet for ecological image labeling, including on the Snapshot Serengeti dataset (3.2M images, 10-class task). The greedy-batch VIG algorithm is empirically robust, scalable, and more label-efficient—achieving 5–7% higher accuracy over max-entropy/BALD, 3% over CoreSet with 30% fewer labels, and faster convergence (Nguyen et al., 12 Sep 2025).
Text-conditioned generation: Conditional-Vendi isolates model-induced from prompt-induced diversity in text-to-image/video/captioning tasks. Information-Vendi quantifies prompt-sample alignment (statistical relevance). Empirical results confirm that only Conditional-Vendi remains constant when diversity is manipulated solely via prompts; Information-Vendi is prompt-sensitive. Comparisons of modern generative models on COCO/ImageNet-derived prompts confirm verdicts match perceptual diversity and relevance (Jalali et al., 2024).
Probabilistic calibration and grouping losses: The information Vendi term quantifies non-recoverable information loss from lossy summary predictions or feature–score compressions, and the conditional Vendi term precisely measures calibration error at a given information level (Charpentier et al., 16 Mar 2026).

In each case, the ability to isolate internal (model-driven) and external (prompt- or condition-driven) diversity, and to separate calibration from information loss, is unattainable with classical entropy or mutual information alone.

7. Extensions, Limitations, and Open Directions

Extensions of Vendi-based scores include:

Rényi-entropy versions supporting differential entropy for continuous spaces (Nguyen et al., 12 Sep 2025).
Multi-view/multi-modal extensions via multi-way kernel products and conditional decompositions.
Incorporation of labeling costs, demographic regularization, or diversity suppression/encouragement during model training (Jalali et al., 2024, Nguyen et al., 12 Sep 2025).

Limitations stem from computational cost for large $k(\theta,\theta) = 1$ 5, embedding-induced biases (score quality is bounded by representational fidelity), and the restrictive assumptions (e.g., cluster separation, kernel normalization) underpinning certain theoretical guarantees (Jalali et al., 2024). In practice, efficient large-scale eigendecomposition remains an area for future development.

A plausible implication is that, as embeddings and similarity measures grow more accurate, matrix-based Vendi decompositions will become increasingly central for diagnosing, regularizing, and optimizing diversity and information flows in complex generative and predictive systems.

Markdown Report Issue Upgrade to Chat

References (4)

Vendi Information Gain: An Alternative To Mutual Information For Science And Machine Learning (2025)

Conditional Vendi Score: An Information-Theoretic Approach to Diversity Evaluation of Prompt-based Generative Models (2024)

Vendi Information Gain for Active Learning and its Application to Ecology (2025)

Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional and Information Vendi Scores.

Conditional & Information Vendi Scores Overview

1. Foundations: Vendi Score and Vendi Entropy

2. Conditional Vendi Scores: Matrix-based and Sample-partitioned Forms

3. Information Vendi Scores and Vendi Information Gain

4. Theoretical Properties and Decomposition

5. Estimation Algorithms and Computational Considerations

6. Empirical Validation and Applications

7. Extensions, Limitations, and Open Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Conditional & Information Vendi Scores Overview

1. Foundations: Vendi Score and Vendi Entropy

2. Conditional Vendi Scores: Matrix-based and Sample-partitioned Forms

3. Information Vendi Scores and Vendi Information Gain

4. Theoretical Properties and Decomposition

5. Estimation Algorithms and Computational Considerations

6. Empirical Validation and Applications

7. Extensions, Limitations, and Open Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research