Papers
Topics
Authors
Recent
2000 character limit reached

Diversity (β-Recall) in Generative Models

Updated 28 November 2025
  • Diversity (β-Recall) is a metric that quantifies the fraction of the real data manifold covered by generated samples, serving as a key indicator of mode coverage.
  • It is typically estimated using nonparametric kNN-based or probabilistic kernel methods in semantic feature spaces like Inception-V3 or GPT2 embeddings.
  • The metric informs trade-offs between sample fidelity and diversity, guiding model evaluation and optimization to mitigate issues such as mode dropping.

Diversity (β\beta-Recall) quantifies the extent to which a generative model covers the modes or the support of the target data distribution. In the context of generative modeling, β\beta-Recall is a principled metric that measures the fraction of the real data manifold captured by the generated samples, and is thus a canonical evaluation for diversity or mode coverage. It is widely employed in the assessment of both image and text generators, serves as the recall axis in two-dimensional precision–recall frontiers, and is essential to diagnosing mode dropping or coverage defects even when global metrics such as FID are favorable (Kynkäänniemi et al., 2019, Sykes et al., 2 May 2024, Park et al., 2023).

1. Formal Definitions and Theoretical Foundations

Standard Definition

Given NN real samples X={xi}i=1NX = \{x_i\}_{i=1}^N from the reference distribution and MM generated samples Y={yj}j=1MY = \{y_j\}_{j=1}^M, and a fixed embedding f()f(\cdot), the ε\varepsilon-Recall is defined as

Recall(ε)=1Ni=1NIε(xi;Y),\mathrm{Recall}(\varepsilon) = \frac{1}{N} \sum_{i=1}^N I_{\varepsilon}(x_i; Y),

where

Iε(xi;Y)={1,minyjYf(xi)f(yj)ε, 0,otherwise.I_\varepsilon(x_i; Y) = \begin{cases} 1, & \min_{y_j \in Y} \|f(x_i) - f(y_j)\| \leq \varepsilon, \ 0, & \text{otherwise}. \end{cases}

Sweeping ε\varepsilon yields the recall curve R(ε)R(\varepsilon). The β\beta-Recall is operationalized in two forms:

  • Fixed-coverage β\beta-Recall: For fixed β(0,1)\beta \in (0,1), find the smallest εβ\varepsilon_\beta such that R(εβ)βR(\varepsilon_\beta) \geq \beta; report either εβ\varepsilon_\beta or simply note β\beta is achieved at this scale.
  • Area under curve (AUC) β\beta-Recall: Aggregate recall over all scales, e.g.

AUC=0εmaxR(ε)w(ε)dε,\mathrm{AUC} = \int_0^{\varepsilon_{\max}} R(\varepsilon) w(\varepsilon) d\varepsilon,

where w(ε)w(\varepsilon) can be uniform (Kynkäänniemi et al., 2019).

Precision–Recall Curve Theory

The unifying formalism Simon et al. 2019 parameterizes the precision–recall (PR) frontier between distributions PP (real) and QQ (model) by a scalar λ>0\lambda > 0,

αλ(P,Q)=min(λp(x),q(x))dx,βλ(P,Q)=αλ(P,Q)λ,\alpha_\lambda(P, Q) = \int \min(\lambda p(x), q(x)) dx, \qquad \beta_\lambda(P, Q) = \frac{\alpha_\lambda(P, Q)}{\lambda},

with (βλ,αλ)(\beta_\lambda, \alpha_\lambda) tracing out the Pareto-optimal fidelity–diversity trade-off. Here, βλ\beta_\lambda is the β\beta-Recall at trade-off parameter λ\lambda.

2. Practical Estimation and Computational Methodology

There are two dominant empirical paradigms for estimating β\beta-Recall:

Nonparametric kNN-based Estimation

  • Feature Construction: Embed both real and generated data in a semantic feature space (e.g., Inception-V3, VGG-16 for vision; GPT2/PCA for text).
  • kNN Support Estimation: For each xix_i, find its minimum distance to the generated set in feature space. ε\varepsilon is swept or set to the kkth-nearest neighbor's distance. For the recall curve, either sweep kk or ε\varepsilon (Kynkäänniemi et al., 2019, Bronnec et al., 16 Feb 2024, Khayatkhoei et al., 2023).
  • Computation:
    • For β\beta-Recall at fixed β\beta, compute R(εk)R(\varepsilon_k) for a grid of thresholds.
    • For PR curve estimation, split data into train/validation sets, fit classifiers, and compute (βλ,αλ)(\beta_\lambda, \alpha_\lambda) consistent with the PRD-curve theory (Sykes et al., 2 May 2024).
  • Hyperparameters: Number of samples (N,MN, M), kk, embedding ff, grid over ε\varepsilon or λ\lambda (Kynkäänniemi et al., 2019, Bronnec et al., 16 Feb 2024).

Probabilistic/Ball-based and Kernel Estimation

  • P-recall: Rather than hard thresholding, PP-recall (or "Probabilistic Recall") assigns a soft kernel pij=max{0,1xiyj/R}p_{ij} = \max\{0,1 - \|x_i - y_j\|/R\} between real and generated pairs, compositing all contributions for each xix_i

$\mathrm{P\mbox{-}recall} = \frac{1}{N} \sum_{i=1}^N \Bigl[ 1 - \prod_{j=1}^M (1 - p_{ij}) \Bigr]$

and RR is a global scale set by average kNN distance among fakes (Park et al., 2023). This method is more robust to outliers and is sensitive to the extent and density of the generated distribution.

3. Interpretations, Trade-offs, and Diverse Contexts

β\beta-Recall cleanly operationalizes diversity as the fraction of reference instances that lie inside the estimated support of the model distribution. High β\beta-Recall across scales implies generative coverage: broad mode coverage and insensitivity to mode dropping. This is in contrast to precision, which corresponds to sample fidelity or quality (Kynkäänniemi et al., 2019, Sykes et al., 2 May 2024, Bronnec et al., 16 Feb 2024).

By varying the parameter ε\varepsilon (or λ\lambda as the PR curve parameter), one can dial trade-offs: small ε\varepsilon (or small λ\lambda) yields stricter matches, favoring high precision and selectivity, while large values relax coverage and favor recall.

Fixed β\beta-Recall is interpretable as the minimal scale required to cover a desired fraction of real data modes. AUC β\beta-Recall balances recall across all scales and can serve as a summary score.

In language modeling, analogous kNN and β\beta-scaled metrics map to the distinctiveness or paraphrase diversity of generations, extending recall-style evaluation to open-ended text (Goldberg, 2023, Bronnec et al., 16 Feb 2024).

4. Failure Modes, High-Dimensional Effects, and Remedies

High-Dimensional Asymmetry

In high-dimensional regimes, standard kNN-based β\beta-Recall degenerates due to the curse of dimensionality. It may saturate at 1 when the model support contains the real data manifold, and 0 just outside it, regardless of actual overlap, thereby failing to capture meaningful gradations in diversity (Khayatkhoei et al., 2023). This emergent asymmetry leads to misinterpretations: e.g., small shifts of the generative support past the real data manifold’s boundary can cause β\beta-Recall to precipitously drop or rise.

The symmetric Recall symRecall(P,Q)=min{Recall(P,Q),cRecall(P,Q)}\mathrm{symRecall}(P, Q) = \min\{\mathrm{Recall}(P, Q), \mathrm{cRecall}(P, Q)\}, where cRecall\mathrm{cRecall} uses real data to define the support and checks for covered generated points, restores symmetry and validity in high-dimensional tests (Khayatkhoei et al., 2023).

Outlier Sensitivity and Robustness

kNN-based II-Recall is susceptible to sample outliers: a single outlier can expand coverage radii, falsely inflating recall. Probabilistic or kernel-based P-recall mitigates this by using soft membership and global radii, so outliers receive minimal weight (Park et al., 2023).

Embedding Dependence

β\beta-Recall is sensitive to the choice of feature embedding. Changing f()f(\cdot) can rescale distance thresholds and thus alter the absolute ε\varepsilon values, though relative comparisons between models remain robust if a consistent embedding is used (Kynkäänniemi et al., 2019, Bronnec et al., 16 Feb 2024).

5. Applications and Extensions

Generative Model Evaluation

β\beta-Recall is integral to the evaluation of GANs, flows, and diffusion models. Comparing the recall and precision axes exposes the full quality–diversity spectrum. For example, mode dropping manifests as high precision but low recall; overdispersed or low-quality outputs yield the opposite. Complete PR curves reveal more nuanced trade-offs than scalar FID scores (Sykes et al., 2 May 2024, Verine et al., 2023, Kynkäänniemi et al., 2019).

Direct Optimization in Model Training

Recent work operationalizes β\beta-Recall as a direct optimization target. Precision–Recall Divergence (DλPRD^{PR}_\lambda), a one-parameter ff-divergence family, admits minimization via adversarial training or variational estimation to explicitly steer generators toward desired regions on the PR frontier (Verine et al., 2023). Algorithms can target enhanced diversity (recall, small λ\lambda) or fidelity (precision, large λ\lambda), with explicit and tunable trade-off.

Domain-Specific Instantiations

  • Language modeling: β\beta-Recall quantifies distinct paraphrastic or pattern coverage ("d-recall") as in (Goldberg, 2023). Here, the setwise recall is the ratio of distinct pattern types generated to the total in the gold corpus.
  • Conformal selection: In conformal selection and candidate diversity (as in DACS), β\beta-Recall appears in the Fβ_\beta-Recall score: Fβ=(1+β2)SH/(β2H+S)\mathrm{F}_\beta=(1+\beta^2) |S\cap H| / (\beta^2 |H| + |S|), trading off diversity against set size under FDR constraints (Nair et al., 19 Jun 2025).
  • LLMs: Adapted to text generation, recall quantifies how much of the reference embedding support is covered, with β\beta-scaling applied to radii for trade-offs (Bronnec et al., 16 Feb 2024).

6. Limitations, Recommendations, and Best Practices

  • Consistent embeddings are mandatory across model comparisons.
  • Report both precision and β\beta-Recall curves (or AUCs); scalar summaries (e.g., minimum radius εβ\varepsilon_\beta, Fβ_{\beta}-score, or PR frontier area) can collapse information but should not replace full curves (Sykes et al., 2 May 2024, Kynkäänniemi et al., 2019).
  • Use large N,MN, M for stable estimation; kk values around 4 or k=Nk = \sqrt{N} balance local and global sensitivity.
  • Outlier robustness: prefer probabilistic recall or symmetric Recall in high-dimensional spaces (Park et al., 2023, Khayatkhoei et al., 2023).
  • In language applications, augment evaluations with both pattern diversity (d-recall/β\beta-Recall) and exhaustiveness (e-recall) (Goldberg, 2023).

7. Comparative Table of β\beta-Recall Formulations

Reference Definition / Key Formulation Notable Context
(Kynkäänniemi et al., 2019) Fraction of real samples within ε\varepsilon-ball of a generated sample; AUC or fixed-β\beta Image GANs, StyleGAN, BigGAN
(Sykes et al., 2 May 2024) βλ=αλ/λ\beta_\lambda = \alpha_\lambda/\lambda from PRD curve Universal PR analysis
(Park et al., 2023) P-recall, probabilistic kernel over all model samples Outlier-robust diversity
(Khayatkhoei et al., 2023) Symmetric recall: min{Recall,cRecall}\min\{\mathrm{Recall}, \mathrm{cRecall}\} High-dimensional regime
(Goldberg, 2023) d-recall: fraction of distinct covered template types Information extraction
(Nair et al., 19 Jun 2025) Fβ_\beta-Recall: (1+β2)SH/(β2H+S)(1+\beta^2)|S\cap H| / (\beta^2 |H|+|S|) Conformal selection
(Bronnec et al., 16 Feb 2024) Fraction of reference support inside generated kNN balls, with optional β\beta-scaling LLMs, text diversity

References

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Diversity ($\beta$-Recall).