Log Expected Empirical Prediction (LEEP)

Updated 7 January 2026

LEEP is a transferability metric that quantifies the expected log-likelihood a predictor, derived from a source model's outputs, assigns to target labels.
It computes empirical joint, marginal, and conditional distributions via a single forward pass, ensuring efficient evaluation of source-target pairings.
High LEEP scores correlate with faster convergence and better transfer performance, guiding model selection and fine-tuning decisions.

The Log Expected Empirical Prediction (LEEP) is a transferability metric for assessing how effectively the learned representations of a source classifier can be transferred to a target task without requiring any retraining. LEEP quantifies the expected log-likelihood that an empirically constructed predictor—based on the source model’s outputs and the target labels—would assign to the target labels. It is utilized for model selection, source-target pairing in transfer learning, and estimating the likely accuracy and convergence speed of downstream transfer without incurring the cost of model fine-tuning. LEEP has also been adapted under distinct mathematical formulations in both transfer learning and small area estimation. The following sections detail its formal definition, computational workflow, theoretical properties, empirical evaluation, comparative analysis, practical guidelines, and limitations.

1. Formal Definition

Given a pre-trained source classifier $\theta$ outputting categorical probabilities over the source label set $\mathcal{Z}$ , and a labeled target data set $\mathcal{D}=\{(x_i, y_i)\}_{i=1}^n$ with $y_i \in \mathcal{Y}$ , define:

Dummy-label prediction: $\theta(x_i)_z := P(z|x_i;\theta)$ for $z \in \mathcal{Z}$ .
Empirical joint $(y, z)$ :

$\hat P(y, z) = \frac{1}{n} \sum_{i: y_i = y} \theta(x_i)_z$

Empirical marginal over $z$ :

$\hat P(z) = \sum_{y \in \mathcal{Y}} \hat P(y, z) = \frac{1}{n} \sum_{i=1}^n \theta(x_i)_z$

Empirical conditional of $y$ given $z$ :

$\hat P(y|z) = \frac{\hat P(y, z)}{\hat P(z)}$

The Expected Empirical Predictor (EEP) for $y$ given $x_i$ is:

$p_{\rm EEP}(y|x_i) = \sum_{z \in \mathcal{Z}} \hat P(y|z) \, \theta(x_i)_z$

The LEEP score is the average log-likelihood of the EEP on $\mathcal{D}$ :

$T(\theta, \mathcal{D}) = \frac{1}{n} \sum_{i=1}^n \log \left( \sum_{z \in \mathcal{Z}} \hat P(y_i|z) \, \theta(x_i)_z \right)$

LEEP values reside in $(-\infty, 0]$ , with higher (less negative) values corresponding to greater expected transferability (Nguyen et al., 2020, Wong et al., 2022).

2. Algorithmic Computation

Computing LEEP for a given source model $\theta$ and target set $\mathcal{D}$ involves the following steps:

Forward pass all $x_i$ through $\theta$ to obtain $\theta(x_i)_z$ for all $i, z$ .
Compute empirical joint $(y, z)$ for each $y, z$ :

$\hat P(y, z) \leftarrow \frac{1}{n} \sum_{i: y_i = y} \theta(x_i)_z$

Compute marginals/conditionals: $\hat P(z) = \sum_{y} \hat P(y, z)$ , $\hat P(y|z) = \hat P(y, z) / \hat P(z)$ .
Compute per-instance scores: $s_i = \sum_{z} \hat P(y_i|z) \, \theta(x_i)_z$ .
Aggregate: $T(\theta, \mathcal{D}) = (1/n) \sum_{i=1}^n \log s_i$ .

This process involves only a single forward pass of the target data through the source model, and $O(n|\mathcal{Z}|)$ additional arithmetic for joint distributions (Nguyen et al., 2020, Wong et al., 2022).

3. Theoretical Properties and Bounds

LEEP is theoretically bounded between a single-label assignment baseline and the optimal (oracle) retrained classifier head:

Upper bound: If $\theta = (w, h)$ with a frozen feature extractor $w$ and head $h$ , and $k^*$ is the maximum-likelihood head on $\mathcal{D}$ fixing $w$ ,

$T(\theta, \mathcal{D}) \leq l(w, k^*) := \max_{k} \frac{1}{n} \sum_{i} \log p(y_i|x_i; w, k)$

Lower bound: Let $z_i = \arg \max_z \theta(x_i)_z$ (“hard” label), and define the Negative Conditional Entropy (NCE)

$NCE(Y|Z) = -\frac{1}{n} \sum_{i=1}^n \log \hat P(y_i | z_i)$

Then,

$T(\theta, \mathcal{D}) \geq NCE(Y|Z) + \frac{1}{n} \sum_{i=1}^n \log \theta(x_i)_{z_i}$

These theoretical results demonstrate that LEEP interpolates between “hard” assignment metrics and the best possible retrained head log-likelihood on the target (Nguyen et al., 2020).

4. Empirical Performance and Correlation with Transfer Success

LEEP performance has been evaluated in settings including large-scale visual transfer (ImageNet → CIFAR-100), small-data and imbalanced regimes, meta-transfer (CNAPs), and RF domain adaptation:

Large data: ImageNet→CIFAR-100 head-retraining, Pearson $\rho \approx 0.974$ ; CIFAR-10→CIFAR-100, $\rho \approx 0.982$ .
Few-shot/imbalanced: $5$ classes $\times 50$ examples, LEEP correlates positively ( $\rho \approx 0.6-0.8$ ), maintains significance with label noise or class imbalance.
Meta-transfer (CNAPs): On 200 random 5-way, 50-shot CIFAR-100 tasks, $\rho \approx 0.59$ .
RF domain adaptation: Across SNR/FO-shifted domains, LEEP correlates with head-retrain accuracy ( $r=0.72$ –$0.82$), high correlation with LogME ( $r \approx 0.85$ –$0.90$), near-optimal source model selection in over 80% of cases (Nguyen et al., 2020, Wong et al., 2022).
Convergence: Models in higher-LEEP bins converge faster and surpass scratch-trained accuracy with fewer epochs.

5. Comparative Analysis with Baselines

LEEP has been compared against Negative Conditional Entropy (NCE) [Tran et al., ICCV’19] and H score [Bao et al., ICIP’19]:

LEEP matches or exceeds NCE’s Pearson $\rho$ in most settings with up to $\sim$ 30% relative improvement (e.g., $\rho$ improves from $0.715$ to $0.798$).
H score frequently fails to produce statistically significant correlations (11/23 cases) and never surpasses LEEP on large-data benchmarks.
In the representative ImageNet→CIFAR-100 head-retraining setting, LEEP achieves $\rho=0.974$ , outperforming the H score’s $0.924$ (Nguyen et al., 2020).

6. Practical Guidelines for Application

Efficiency: LEEP requires only one forward pass per (source model, target data) pair.
Minimal data: Robust to small, imbalanced, or noisy target sets provided there are at least several examples per class to reliably estimate empirical $\hat P(y|z)$ .
Utilities:
- Rank source models for transfer (model zoo selection).
- Screen source-target pairings for joint/multi-task grouping.
- Guide decisions on fine-tuning necessity and anticipate convergence rates.
Domain transferability: LEEP is not symmetric; transfer from hard→easy yields higher scores than the reverse.
RF applications: In scenarios such as modulation classification, LEEP aligns with domain proximity (SNR/FO), and can guide rapid source selection without retraining (Nguyen et al., 2020, Wong et al., 2022).

7. Limitations and Considerations

Data sparsity: With very few examples per target class (e.g., $<5$ ), empirical estimates $\hat P(y|z)$ become unreliable, raising LEEP’s variance.
Source model dependency: LEEP accuracy presumes a reasonably well-trained source classifier.
Feature usage: LEEP operates on softmax outputs; it does not explicitly exploit intermediate feature activations.
Architectural effects: It may not capture nuanced behaviors when fine-tuning is highly architecture- or hyperparameter-sensitive.
Scope: Requires softmax-based source models and compatible input spaces; does not generalize to non-classification or regression tasks without modification (Nguyen et al., 2020, Wong et al., 2022).

LEEP stands as a theoretically grounded, computationally efficient, and empirically validated metric for assessing source model transferability across numerous domains and regimes. Its design, tight theoretical bounds, and high empirical correlation with actual transfer performance make it a practical decision metric in both research and applied transfer learning settings (Nguyen et al., 2020, Wong et al., 2022).

Markdown Report Issue Upgrade to Chat

References (2)

LEEP: A New Measure to Evaluate Transferability of Learned Representations (2020)

Assessing the Value of Transfer Learning Metrics for RF Domain Adaptation (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Log Expected Empirical Prediction (LEEP).