Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conditional Vendi Score: Diversity Metric

Updated 5 March 2026
  • The Conditional Vendi Score is an information‐theoretic metric that isolates internal generative diversity from prompt-induced variations using kernel-based Rényi entropies.
  • It decomposes overall diversity into conditional diversity (model-induced) and mutual information (prompt-induced), enabling nuanced model diagnostics.
  • Its computation uses feature extraction and kernel matrix eigendecomposition, though large datasets may require efficient approximations to manage computational cost.

The Conditional Vendi Score is an information-theoretic metric for quantifying diversity within the outputs of prompt-based generative models, specifically designed to disentangle internal (model-induced) diversity from prompt-induced diversity. Unlike classical diversity measures, which are tailored to unconditional generation, the Conditional Vendi Score provides a rigorous decomposition that attributes variability to its true sources, enabling deeper analysis of generative model behavior in text-conditioned generation of images, videos, or other modalities. This approach relies on matrix-based (kernel) Rényi entropies computed from embeddings of both input prompts and outputs, furnishing practical tools for empirical evaluation and theoretical analysis (Jalali et al., 2024).

1. Mathematical Foundation: Kernel-Based Entropy and the Vendi Scores

Let X={x1,,xn}X = \{x_1,\ldots,x_n\} be nn generated samples in Rd\mathbb{R}^d. A positive semi-definite kernel kX(x,x)k_X(x,x') (e.g., Gaussian RBF) is selected and scaled so kX(x,x)=1k_X(x,x)=1. The n×nn\times n kernel matrix is KX(i,j)=kX(xi,xj)K_X(i,j) = k_X(x_i, x_j). The normalized matrix (1/n)KX(1/n)K_X has nonnegative eigenvalues λ1,,λn\lambda_1, \ldots, \lambda_n summing to $1$. The order-α\alpha (α>0,α1\alpha>0, \alpha\neq1) matrix-based Rényi entropy is defined as:

Hα(X)=11αlog(i=1nλiα).H_\alpha(X) = \frac{1}{1-\alpha} \log\left( \sum_{i=1}^n \lambda_i^\alpha \right).

For α1\alpha\to1, this limits to the Shannon entropy H1(X)=i=1nλilogλiH_1(X) = -\sum_{i=1}^n \lambda_i \log \lambda_i. The (unconditional) Vendi Score is given by:

Vendiα(X)=exp(Hα(X)).\mathrm{Vendi}_\alpha(X) = \exp(H_\alpha(X)).

2. Conditional Vendi Score: Decomposition and Interpretation

In prompt-based generation, a second variable TT (the set of text prompts) is introduced, with its own kernel kT(t,t)k_T(t, t') and matrix KTK_T. The core decomposition is obtained via the joint kernel KXKTK_X \odot K_T (Hadamard product), normalized as before. The matrix-based Rényi entropies for the output (XX), prompt (TT), and their joint (X,TX,T) are:

Hα(X),  Hα(T),  Hα(X,T)=Hα((1/n)(KXKT))H_\alpha(X),\; H_\alpha(T),\; H_\alpha(X,T) = H_\alpha\left( (1/n)(K_X \odot K_T) \right)

The Conditional Vendi Score is based on matrix-based conditional entropy (following Giraldo):

Hα(XT)=Hα(X,T)Hα(T)H_\alpha(X|T) = H_\alpha(X,T) - H_\alpha(T)

Its exponential,

Conditional-Vendiα(XT)=exp(Hα(XT)),\mathrm{Conditional\text{-}Vendi}_\alpha(X|T) = \exp(H_\alpha(X|T)),

quantifies the internal (model-induced) diversity: the average variety of XX for fixed TT, thus excluding prompt-induced variations.

A companion metric, the Information-Vendi Score, captures the statistical relevance (prompt-induced diversity) via matrix-based mutual information:

Iα(X;T)=Hα(X)+Hα(T)Hα(X,T),Information-Vendiα(X;T)=exp(Iα(X;T)).I_\alpha(X; T) = H_\alpha(X) + H_\alpha(T) - H_\alpha(X,T), \qquad \mathrm{Information\text{-}Vendi}_\alpha(X;T) = \exp(I_\alpha(X; T)).

These quantities satisfy the exact factorization:

Hα(X)=Hα(XT)+Iα(X;T)andVendiα(X)=Conditional-Vendiα(XT)Information-Vendiα(X;T)H_\alpha(X) = H_\alpha(X | T) + I_\alpha(X; T) \qquad \text{and} \qquad \mathrm{Vendi}_\alpha(X) = \mathrm{Conditional\text{-}Vendi}_\alpha(X | T) \cdot \mathrm{Information\text{-}Vendi}_\alpha(X; T)

(Jalali et al., 2024).

3. Algorithmic Implementation and Parameter Settings

For a sample set of nn prompt-output pairs {(ti,xi)}i=1n\{(t_i, x_i)\}_{i=1}^n, the following procedure enables practical computation:

  1. Extract prompt features ϕT(ti)\phi_T(t_i) and output features ϕX(xi)\phi_X(x_i) (e.g., with CLIP, DINOv2).
  2. Compute Gaussian kernels:

KT[i,j]=exp(ϕT(ti)ϕT(tj)2/(2σT2))K_T[i,j] = \exp\left(-\|\phi_T(t_i) - \phi_T(t_j)\|^2 / (2\sigma_T^2)\right)

KX[i,j]=exp(ϕX(xi)ϕX(xj)2/(2σX2))K_X[i,j] = \exp\left(-\|\phi_X(x_i) - \phi_X(x_j)\|^2 / (2\sigma_X^2)\right)

  1. Form KX,T=KXKTK_{X,T} = K_X \odot K_T and normalize all matrices by $1/n$.
  2. Compute eigenvalues of each normalized matrix.
  3. Evaluate entropies Hα(X)H_\alpha(X), Hα(T)H_\alpha(T), Hα(X,T)H_\alpha(X,T) as above.
  4. Obtain Hα(XT)H_\alpha(X|T) and Iα(X;T)I_\alpha(X;T); exponentiate to get Vendi-style scores.

Parameter choices include kernel bandwidths σX\sigma_X, σT\sigma_T (using the median heuristic or variance-based criteria) and order α\alpha (typically $1$ or $2$).

Computational complexity is dominated by O(n2d)O(n^2d) distance calculations and O(n3)O(n^3) eigenanalysis. For large nn, approximate methods such as Random Fourier Features or Nyström extensions are applicable. Empirical studies are typically performed for n2,000n \approx 2{,}000 to 30,00030{,}000 (Jalali et al., 2024).

4. Empirical Studies and Comparative Analyses

Empirical results demonstrate the diagnostic power of the Conditional Vendi Score:

  • Dog-breed and animal type experiments: As prompt specificity increases (e.g., from "a photo of a dog" to "a photo of a beagle"), unconditional Vendi Scores rise, but Conditional-Vendi remains constant, confirming that additional diversity arises only from prompts.
  • Text-to-image model comparison: Grouping prompts into kk clusters and associating each with a unique image produces high Information-Vendi (prompt-induced diversity) but low Conditional-Vendi. In real models, differences in these scores reveal whether diversity or prompt alignment dominates.
  • Text-to-video and image captioning: Conditional-Vendi and Information-Vendi align with visual inspections for diversity and prompt relevance across models.
  • Ablation studies: Sequential substitution of generated samples with random samples leads to a monotonic decline in Information-Vendi, reflecting reduced prompt-image association (Jalali et al., 2024).

These results show that the Conditional Vendi Score provides a rigorous and interpretable measure of the internal diversity contributed by the generative model, disentangled from prompt effects.

5. Theoretical Properties and Extensions

Key mathematical properties include:

  • Additivity and decomposition: Exact splitting of diversity into conditional (internal) and mutual (prompt-induced) components.
  • Nonnegativity and reduction: For identity kernels, Conditional Vendi entropy recovers Shannon conditional entropy; for arbitrary kernels, it captures similarity-weighted diversity.
  • Sensitivity to embedding and kernel: Choice of feature extractor and kernel bandwidth significantly influences the resulting scores.

The framework is extendable to diverse modalities, including video (with spatio-temporal features), audio (via suitable kernels), and multi-modal outputs. In generative adversarial training contexts, the Conditional Vendi Score can be incorporated directly as a regularizer to encourage high internal diversity. Open questions include optimal selection of the order parameter α\alpha and fully automated kernel parameterization (Jalali et al., 2024).

6. Limitations and Practical Considerations

Limitations of the Conditional Vendi Score encompass:

  • Computational scaling: O(n2)O(n^2) memory and O(n3)O(n^3) compute for kernel matrices/eigendecompositions, though low-rank approximations offer relief for large-scale applications.
  • Bandwidth and embedding choice: Sensitivity to feature space and kernel bandwidth parameters; different choices can substantially affect scores.
  • Single-statistic nature: The score is a global aggregate measure and does not localize diversity to specific samples.
  • No direct assessment of fidelity: The Conditional Vendi framework measures diversity only; fidelity to the data distribution or realism is not directly encoded but could potentially be balanced via reference-based metrics (Jalali et al., 2024).

The Conditional Vendi Score integrates into a broader landscape of information-theoretic and kernel-based analysis tools for generative modeling:

  • It generalizes matrix-based entropy approaches, notably extending the unconditional Vendi Score of Friedman & Dieng and the matrix-based conditional entropy of Giraldo et al.
  • Compared to mutual information, the Conditional Vendi Score (and its associated Vendi Information Gain functional) enables direct similarity-based, sample-driven computation, sidestepping the need for parametric models or tractable densities (Nguyen et al., 13 May 2025).
  • Recent work has exploited Conditional and Contextualized Vendi Scores in diverse settings, from out-of-distribution detection (Pasarkar et al., 10 Feb 2026) and guided diffusion sampling for bias mitigation (Hemmat et al., 2024) to score distillation in conditional denoiser training (Peng et al., 11 Jun 2025).

The Conditional Vendi Score thus provides a modern, flexible, and theoretically grounded framework for diversity evaluation in prompt-conditioned generative modeling, enabling nuanced analysis and model diagnostics in high-dimensional and complex generative tasks.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional Vendi Score.