Conditional Vendi Score: Diversity Metric
- The Conditional Vendi Score is an information‐theoretic metric that isolates internal generative diversity from prompt-induced variations using kernel-based Rényi entropies.
- It decomposes overall diversity into conditional diversity (model-induced) and mutual information (prompt-induced), enabling nuanced model diagnostics.
- Its computation uses feature extraction and kernel matrix eigendecomposition, though large datasets may require efficient approximations to manage computational cost.
The Conditional Vendi Score is an information-theoretic metric for quantifying diversity within the outputs of prompt-based generative models, specifically designed to disentangle internal (model-induced) diversity from prompt-induced diversity. Unlike classical diversity measures, which are tailored to unconditional generation, the Conditional Vendi Score provides a rigorous decomposition that attributes variability to its true sources, enabling deeper analysis of generative model behavior in text-conditioned generation of images, videos, or other modalities. This approach relies on matrix-based (kernel) Rényi entropies computed from embeddings of both input prompts and outputs, furnishing practical tools for empirical evaluation and theoretical analysis (Jalali et al., 2024).
1. Mathematical Foundation: Kernel-Based Entropy and the Vendi Scores
Let be generated samples in . A positive semi-definite kernel (e.g., Gaussian RBF) is selected and scaled so . The kernel matrix is . The normalized matrix has nonnegative eigenvalues summing to $1$. The order- () matrix-based Rényi entropy is defined as:
For , this limits to the Shannon entropy . The (unconditional) Vendi Score is given by:
2. Conditional Vendi Score: Decomposition and Interpretation
In prompt-based generation, a second variable (the set of text prompts) is introduced, with its own kernel and matrix . The core decomposition is obtained via the joint kernel (Hadamard product), normalized as before. The matrix-based Rényi entropies for the output (), prompt (), and their joint () are:
The Conditional Vendi Score is based on matrix-based conditional entropy (following Giraldo):
Its exponential,
quantifies the internal (model-induced) diversity: the average variety of for fixed , thus excluding prompt-induced variations.
A companion metric, the Information-Vendi Score, captures the statistical relevance (prompt-induced diversity) via matrix-based mutual information:
These quantities satisfy the exact factorization:
3. Algorithmic Implementation and Parameter Settings
For a sample set of prompt-output pairs , the following procedure enables practical computation:
- Extract prompt features and output features (e.g., with CLIP, DINOv2).
- Compute Gaussian kernels:
- Form and normalize all matrices by $1/n$.
- Compute eigenvalues of each normalized matrix.
- Evaluate entropies , , as above.
- Obtain and ; exponentiate to get Vendi-style scores.
Parameter choices include kernel bandwidths , (using the median heuristic or variance-based criteria) and order (typically $1$ or $2$).
Computational complexity is dominated by distance calculations and eigenanalysis. For large , approximate methods such as Random Fourier Features or Nyström extensions are applicable. Empirical studies are typically performed for to (Jalali et al., 2024).
4. Empirical Studies and Comparative Analyses
Empirical results demonstrate the diagnostic power of the Conditional Vendi Score:
- Dog-breed and animal type experiments: As prompt specificity increases (e.g., from "a photo of a dog" to "a photo of a beagle"), unconditional Vendi Scores rise, but Conditional-Vendi remains constant, confirming that additional diversity arises only from prompts.
- Text-to-image model comparison: Grouping prompts into clusters and associating each with a unique image produces high Information-Vendi (prompt-induced diversity) but low Conditional-Vendi. In real models, differences in these scores reveal whether diversity or prompt alignment dominates.
- Text-to-video and image captioning: Conditional-Vendi and Information-Vendi align with visual inspections for diversity and prompt relevance across models.
- Ablation studies: Sequential substitution of generated samples with random samples leads to a monotonic decline in Information-Vendi, reflecting reduced prompt-image association (Jalali et al., 2024).
These results show that the Conditional Vendi Score provides a rigorous and interpretable measure of the internal diversity contributed by the generative model, disentangled from prompt effects.
5. Theoretical Properties and Extensions
Key mathematical properties include:
- Additivity and decomposition: Exact splitting of diversity into conditional (internal) and mutual (prompt-induced) components.
- Nonnegativity and reduction: For identity kernels, Conditional Vendi entropy recovers Shannon conditional entropy; for arbitrary kernels, it captures similarity-weighted diversity.
- Sensitivity to embedding and kernel: Choice of feature extractor and kernel bandwidth significantly influences the resulting scores.
The framework is extendable to diverse modalities, including video (with spatio-temporal features), audio (via suitable kernels), and multi-modal outputs. In generative adversarial training contexts, the Conditional Vendi Score can be incorporated directly as a regularizer to encourage high internal diversity. Open questions include optimal selection of the order parameter and fully automated kernel parameterization (Jalali et al., 2024).
6. Limitations and Practical Considerations
Limitations of the Conditional Vendi Score encompass:
- Computational scaling: memory and compute for kernel matrices/eigendecompositions, though low-rank approximations offer relief for large-scale applications.
- Bandwidth and embedding choice: Sensitivity to feature space and kernel bandwidth parameters; different choices can substantially affect scores.
- Single-statistic nature: The score is a global aggregate measure and does not localize diversity to specific samples.
- No direct assessment of fidelity: The Conditional Vendi framework measures diversity only; fidelity to the data distribution or realism is not directly encoded but could potentially be balanced via reference-based metrics (Jalali et al., 2024).
7. Broader Context and Related Methodologies
The Conditional Vendi Score integrates into a broader landscape of information-theoretic and kernel-based analysis tools for generative modeling:
- It generalizes matrix-based entropy approaches, notably extending the unconditional Vendi Score of Friedman & Dieng and the matrix-based conditional entropy of Giraldo et al.
- Compared to mutual information, the Conditional Vendi Score (and its associated Vendi Information Gain functional) enables direct similarity-based, sample-driven computation, sidestepping the need for parametric models or tractable densities (Nguyen et al., 13 May 2025).
- Recent work has exploited Conditional and Contextualized Vendi Scores in diverse settings, from out-of-distribution detection (Pasarkar et al., 10 Feb 2026) and guided diffusion sampling for bias mitigation (Hemmat et al., 2024) to score distillation in conditional denoiser training (Peng et al., 11 Jun 2025).
The Conditional Vendi Score thus provides a modern, flexible, and theoretically grounded framework for diversity evaluation in prompt-conditioned generative modeling, enabling nuanced analysis and model diagnostics in high-dimensional and complex generative tasks.