Diversity (β-Recall) in Generative Models

Updated 28 November 2025

Diversity (β-Recall) is a metric that quantifies the fraction of the real data manifold covered by generated samples, serving as a key indicator of mode coverage.
It is typically estimated using nonparametric kNN-based or probabilistic kernel methods in semantic feature spaces like Inception-V3 or GPT2 embeddings.
The metric informs trade-offs between sample fidelity and diversity, guiding model evaluation and optimization to mitigate issues such as mode dropping.

Diversity ( $\beta$ -Recall) quantifies the extent to which a generative model covers the modes or the support of the target data distribution. In the context of generative modeling, $\beta$ -Recall is a principled metric that measures the fraction of the real data manifold captured by the generated samples, and is thus a canonical evaluation for diversity or mode coverage. It is widely employed in the assessment of both image and text generators, serves as the recall axis in two-dimensional precision–recall frontiers, and is essential to diagnosing mode dropping or coverage defects even when global metrics such as FID are favorable (Kynkäänniemi et al., 2019, Sykes et al., 2024, Park et al., 2023).

1. Formal Definitions and Theoretical Foundations

Standard Definition

Given $N$ real samples $X = \{x_i\}_{i=1}^N$ from the reference distribution and $M$ generated samples $Y = \{y_j\}_{j=1}^M$ , and a fixed embedding $f(\cdot)$ , the $\varepsilon$ -Recall is defined as

$\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$

where

$I_\varepsilon(x_i; Y) = \begin{cases} 1, & \min_{y_j \in Y} \|f(x_i) - f(y_j)\| \leq \varepsilon, \ 0, & \text{otherwise}. \end{cases}$

Sweeping $\beta$ 0 yields the recall curve $\beta$ 1. The $\beta$ 2-Recall is operationalized in two forms:

Fixed-coverage $\beta$ 3-Recall: For fixed $\beta$ 4, find the smallest $\beta$ 5 such that $\beta$ 6; report either $\beta$ 7 or simply note $\beta$ 8 is achieved at this scale.
Area under curve (AUC) $\beta$ 9-Recall: Aggregate recall over all scales, e.g.

$N$ 0

where $N$ 1 can be uniform (Kynkäänniemi et al., 2019).

Precision–Recall Curve Theory

The unifying formalism Simon et al. 2019 parameterizes the precision–recall (PR) frontier between distributions $N$ 2 (real) and $N$ 3 (model) by a scalar $N$ 4,

$N$ 5

with $N$ 6 tracing out the Pareto-optimal fidelity–diversity trade-off. Here, $N$ 7 is the $N$ 8-Recall at trade-off parameter $N$ 9.

2. Practical Estimation and Computational Methodology

There are two dominant empirical paradigms for estimating $X = \{x_i\}_{i=1}^N$ 0-Recall:

Nonparametric kNN-based Estimation

Feature Construction: Embed both real and generated data in a semantic feature space (e.g., Inception-V3, VGG-16 for vision; GPT2/PCA for text).
kNN Support Estimation: For each $X = \{x_i\}_{i=1}^N$ 1, find its minimum distance to the generated set in feature space. $X = \{x_i\}_{i=1}^N$ 2 is swept or set to the $X = \{x_i\}_{i=1}^N$ 3th-nearest neighbor's distance. For the recall curve, either sweep $X = \{x_i\}_{i=1}^N$ 4 or $X = \{x_i\}_{i=1}^N$ 5 (Kynkäänniemi et al., 2019, Bronnec et al., 2024, Khayatkhoei et al., 2023).
Computation:
- For $X = \{x_i\}_{i=1}^N$ 6-Recall at fixed $X = \{x_i\}_{i=1}^N$ 7, compute $X = \{x_i\}_{i=1}^N$ 8 for a grid of thresholds.
- For PR curve estimation, split data into train/validation sets, fit classifiers, and compute $X = \{x_i\}_{i=1}^N$ 9 consistent with the PRD-curve theory (Sykes et al., 2024).
Hyperparameters: Number of samples ( $M$ 0), $M$ 1, embedding $M$ 2, grid over $M$ 3 or $M$ 4 (Kynkäänniemi et al., 2019, Bronnec et al., 2024).

Probabilistic/Ball-based and Kernel Estimation

P-recall: Rather than hard thresholding, $M$ 5-recall (or "Probabilistic Recall") assigns a soft kernel $M$ 6 between real and generated pairs, compositing all contributions for each $M$ 7

$M$ 8

and $M$ 9 is a global scale set by average kNN distance among fakes (Park et al., 2023). This method is more robust to outliers and is sensitive to the extent and density of the generated distribution.

3. Interpretations, Trade-offs, and Diverse Contexts

$Y = \{y_j\}_{j=1}^M$ 0-Recall cleanly operationalizes diversity as the fraction of reference instances that lie inside the estimated support of the model distribution. High $Y = \{y_j\}_{j=1}^M$ 1-Recall across scales implies generative coverage: broad mode coverage and insensitivity to mode dropping. This is in contrast to precision, which corresponds to sample fidelity or quality (Kynkäänniemi et al., 2019, Sykes et al., 2024, Bronnec et al., 2024).

By varying the parameter $Y = \{y_j\}_{j=1}^M$ 2 (or $Y = \{y_j\}_{j=1}^M$ 3 as the PR curve parameter), one can dial trade-offs: small $Y = \{y_j\}_{j=1}^M$ 4 (or small $Y = \{y_j\}_{j=1}^M$ 5) yields stricter matches, favoring high precision and selectivity, while large values relax coverage and favor recall.

Fixed $Y = \{y_j\}_{j=1}^M$ 6-Recall is interpretable as the minimal scale required to cover a desired fraction of real data modes. AUC $Y = \{y_j\}_{j=1}^M$ 7-Recall balances recall across all scales and can serve as a summary score.

In language modeling, analogous kNN and $Y = \{y_j\}_{j=1}^M$ 8-scaled metrics map to the distinctiveness or paraphrase diversity of generations, extending recall-style evaluation to open-ended text (Goldberg, 2023, Bronnec et al., 2024).

4. Failure Modes, High-Dimensional Effects, and Remedies

High-Dimensional Asymmetry

In high-dimensional regimes, standard kNN-based $Y = \{y_j\}_{j=1}^M$ 9-Recall degenerates due to the curse of dimensionality. It may saturate at 1 when the model support contains the real data manifold, and 0 just outside it, regardless of actual overlap, thereby failing to capture meaningful gradations in diversity (Khayatkhoei et al., 2023). This emergent asymmetry leads to misinterpretations: e.g., small shifts of the generative support past the real data manifold’s boundary can cause $f(\cdot)$ 0-Recall to precipitously drop or rise.

The symmetric Recall $f(\cdot)$ 1, where $f(\cdot)$ 2 uses real data to define the support and checks for covered generated points, restores symmetry and validity in high-dimensional tests (Khayatkhoei et al., 2023).

Outlier Sensitivity and Robustness

kNN-based $f(\cdot)$ 3-Recall is susceptible to sample outliers: a single outlier can expand coverage radii, falsely inflating recall. Probabilistic or kernel-based P-recall mitigates this by using soft membership and global radii, so outliers receive minimal weight (Park et al., 2023).

Embedding Dependence

$f(\cdot)$ 4-Recall is sensitive to the choice of feature embedding. Changing $f(\cdot)$ 5 can rescale distance thresholds and thus alter the absolute $f(\cdot)$ 6 values, though relative comparisons between models remain robust if a consistent embedding is used (Kynkäänniemi et al., 2019, Bronnec et al., 2024).

5. Applications and Extensions

Generative Model Evaluation

$f(\cdot)$ 7-Recall is integral to the evaluation of GANs, flows, and diffusion models. Comparing the recall and precision axes exposes the full quality–diversity spectrum. For example, mode dropping manifests as high precision but low recall; overdispersed or low-quality outputs yield the opposite. Complete PR curves reveal more nuanced trade-offs than scalar FID scores (Sykes et al., 2024, Verine et al., 2023, Kynkäänniemi et al., 2019).

Direct Optimization in Model Training

Recent work operationalizes $f(\cdot)$ 8-Recall as a direct optimization target. Precision–Recall Divergence ( $f(\cdot)$ 9), a one-parameter $\varepsilon$ 0-divergence family, admits minimization via adversarial training or variational estimation to explicitly steer generators toward desired regions on the PR frontier (Verine et al., 2023). Algorithms can target enhanced diversity (recall, small $\varepsilon$ 1) or fidelity (precision, large $\varepsilon$ 2), with explicit and tunable trade-off.

Domain-Specific Instantiations

Language modeling: $\varepsilon$ 3-Recall quantifies distinct paraphrastic or pattern coverage ("d-recall") as in (Goldberg, 2023). Here, the setwise recall is the ratio of distinct pattern types generated to the total in the gold corpus.
Conformal selection: In conformal selection and candidate diversity (as in DACS), $\varepsilon$ 4-Recall appears in the F $\varepsilon$ 5-Recall score: $\varepsilon$ 6, trading off diversity against set size under FDR constraints (Nair et al., 19 Jun 2025).
LLMs: Adapted to text generation, recall quantifies how much of the reference embedding support is covered, with $\varepsilon$ 7-scaling applied to radii for trade-offs (Bronnec et al., 2024).

6. Limitations, Recommendations, and Best Practices

Consistent embeddings are mandatory across model comparisons.
Report both precision and $\varepsilon$ 8-Recall curves (or AUCs); scalar summaries (e.g., minimum radius $\varepsilon$ 9, F $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 0-score, or PR frontier area) can collapse information but should not replace full curves (Sykes et al., 2024, Kynkäänniemi et al., 2019).
Use large $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 1 for stable estimation; $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 2 values around 4 or $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 3 balance local and global sensitivity.
Outlier robustness: prefer probabilistic recall or symmetric Recall in high-dimensional spaces (Park et al., 2023, Khayatkhoei et al., 2023).
In language applications, augment evaluations with both pattern diversity (d-recall/ $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 4-Recall) and exhaustiveness (e-recall) (Goldberg, 2023).

7. Comparative Table of $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 5-Recall Formulations

Reference	Definition / Key Formulation	Notable Context
(Kynkäänniemi et al., 2019)	Fraction of real samples within $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 6-ball of a generated sample; AUC or fixed- $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 7	Image GANs, StyleGAN, BigGAN
(Sykes et al., 2024)	$\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 8 from PRD curve	Universal PR analysis
(Park et al., 2023)	P-recall, probabilistic kernel over all model samples	Outlier-robust diversity
(Khayatkhoei et al., 2023)	Symmetric recall: $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 9	High-dimensional regime
(Goldberg, 2023)	d-recall: fraction of distinct covered template types	Information extraction
(Nair et al., 19 Jun 2025)	F $I_\varepsilon(x_i; Y) = \begin{cases} 1, & \min_{y_j \in Y} \\|f(x_i) - f(y_j)\\| \leq \varepsilon, \ 0, & \text{otherwise}. \end{cases}$ 0-Recall: $I_\varepsilon(x_i; Y) = \begin{cases} 1, & \min_{y_j \in Y} \\|f(x_i) - f(y_j)\\| \leq \varepsilon, \ 0, & \text{otherwise}. \end{cases}$ 1	Conformal selection
(Bronnec et al., 2024)	Fraction of reference support inside generated kNN balls, with optional $I_\varepsilon(x_i; Y) = \begin{cases} 1, & \min_{y_j \in Y} \\|f(x_i) - f(y_j)\\| \leq \varepsilon, \ 0, & \text{otherwise}. \end{cases}$ 2-scaling	LLMs, text diversity

References

Improved Precision and Recall Metric for Assessing Generative Models (Kynkäänniemi et al., 2019)
Unifying and extending Precision Recall metrics for assessing generative models (Sykes et al., 2024)
Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows (Verine et al., 2023)
Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models (Park et al., 2023)
Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions (Khayatkhoei et al., 2023)
Two Kinds of Recall (Goldberg, 2023)
Diversifying Conformal Selections (Nair et al., 19 Jun 2025)
Exploring Precision and Recall to assess the quality and diversity of LLMs (Bronnec et al., 2024)

Markdown Report Issue Upgrade to Chat

References (8)

Improved Precision and Recall Metric for Assessing Generative Models (2019)

Unifying and extending Precision Recall metrics for assessing generative models (2024)

Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models (2023)

Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows (2023)

Exploring Precision and Recall to assess the quality and diversity of LLMs (2024)

Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions (2023)

Two Kinds of Recall (2023)

Diversifying Conformal Selections (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Diversity ($\beta$-Recall).

Diversity (β-Recall) in Generative Models

1. Formal Definitions and Theoretical Foundations

Standard Definition

Precision–Recall Curve Theory

2. Practical Estimation and Computational Methodology

Nonparametric kNN-based Estimation

Probabilistic/Ball-based and Kernel Estimation

3. Interpretations, Trade-offs, and Diverse Contexts

4. Failure Modes, High-Dimensional Effects, and Remedies

High-Dimensional Asymmetry

Outlier Sensitivity and Robustness

Embedding Dependence

5. Applications and Extensions

Generative Model Evaluation

Direct Optimization in Model Training

Domain-Specific Instantiations

6. Limitations, Recommendations, and Best Practices

7. Comparative Table of $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 5-Recall Formulations

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Diversity (β-Recall) in Generative Models

1. Formal Definitions and Theoretical Foundations

Standard Definition

Precision–Recall Curve Theory

2. Practical Estimation and Computational Methodology

Nonparametric kNN-based Estimation

Probabilistic/Ball-based and Kernel Estimation

3. Interpretations, Trade-offs, and Diverse Contexts

4. Failure Modes, High-Dimensional Effects, and Remedies

High-Dimensional Asymmetry

Outlier Sensitivity and Robustness

Embedding Dependence

5. Applications and Extensions

Generative Model Evaluation

Direct Optimization in Model Training

Domain-Specific Instantiations

6. Limitations, Recommendations, and Best Practices

7. Comparative Table of Recall(ε)=1N∑i=1NIε(xi;Y),\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),Recall(ε)=N1​i=1∑N​Iε​(xi​;Y),5-Recall Formulations

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

7. Comparative Table of $\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),$ 5-Recall Formulations