Papers
Topics
Authors
Recent
Search
2000 character limit reached

Log Expected Empirical Prediction (LEEP)

Updated 7 January 2026
  • LEEP is a transferability metric that quantifies the expected log-likelihood a predictor, derived from a source model's outputs, assigns to target labels.
  • It computes empirical joint, marginal, and conditional distributions via a single forward pass, ensuring efficient evaluation of source-target pairings.
  • High LEEP scores correlate with faster convergence and better transfer performance, guiding model selection and fine-tuning decisions.

The Log Expected Empirical Prediction (LEEP) is a transferability metric for assessing how effectively the learned representations of a source classifier can be transferred to a target task without requiring any retraining. LEEP quantifies the expected log-likelihood that an empirically constructed predictor—based on the source model’s outputs and the target labels—would assign to the target labels. It is utilized for model selection, source-target pairing in transfer learning, and estimating the likely accuracy and convergence speed of downstream transfer without incurring the cost of model fine-tuning. LEEP has also been adapted under distinct mathematical formulations in both transfer learning and small area estimation. The following sections detail its formal definition, computational workflow, theoretical properties, empirical evaluation, comparative analysis, practical guidelines, and limitations.

1. Formal Definition

Given a pre-trained source classifier θ\theta outputting categorical probabilities over the source label set Z\mathcal{Z}, and a labeled target data set D={(xi,yi)}i=1n\mathcal{D}=\{(x_i, y_i)\}_{i=1}^n with yiYy_i \in \mathcal{Y}, define:

  • Dummy-label prediction: θ(xi)z:=P(zxi;θ)\theta(x_i)_z := P(z|x_i;\theta) for zZz \in \mathcal{Z}.
  • Empirical joint (y,z)(y, z):

P^(y,z)=1ni:yi=yθ(xi)z\hat P(y, z) = \frac{1}{n} \sum_{i: y_i = y} \theta(x_i)_z

  • Empirical marginal over zz:

P^(z)=yYP^(y,z)=1ni=1nθ(xi)z\hat P(z) = \sum_{y \in \mathcal{Y}} \hat P(y, z) = \frac{1}{n} \sum_{i=1}^n \theta(x_i)_z

  • Empirical conditional of yy given zz:

P^(yz)=P^(y,z)P^(z)\hat P(y|z) = \frac{\hat P(y, z)}{\hat P(z)}

The Expected Empirical Predictor (EEP) for yy given xix_i is:

pEEP(yxi)=zZP^(yz)θ(xi)zp_{\rm EEP}(y|x_i) = \sum_{z \in \mathcal{Z}} \hat P(y|z) \, \theta(x_i)_z

The LEEP score is the average log-likelihood of the EEP on D\mathcal{D}:

T(θ,D)=1ni=1nlog(zZP^(yiz)θ(xi)z)T(\theta, \mathcal{D}) = \frac{1}{n} \sum_{i=1}^n \log \left( \sum_{z \in \mathcal{Z}} \hat P(y_i|z) \, \theta(x_i)_z \right)

LEEP values reside in (,0](-\infty, 0], with higher (less negative) values corresponding to greater expected transferability (Nguyen et al., 2020, Wong et al., 2022).

2. Algorithmic Computation

Computing LEEP for a given source model θ\theta and target set D\mathcal{D} involves the following steps:

  1. Forward pass all xix_i through θ\theta to obtain θ(xi)z\theta(x_i)_z for all i,zi, z.
  2. Compute empirical joint (y,z)(y, z) for each y,zy, z:

P^(y,z)1ni:yi=yθ(xi)z\hat P(y, z) \leftarrow \frac{1}{n} \sum_{i: y_i = y} \theta(x_i)_z

  1. Compute marginals/conditionals: P^(z)=yP^(y,z)\hat P(z) = \sum_{y} \hat P(y, z), P^(yz)=P^(y,z)/P^(z)\hat P(y|z) = \hat P(y, z) / \hat P(z).
  2. Compute per-instance scores: si=zP^(yiz)θ(xi)zs_i = \sum_{z} \hat P(y_i|z) \, \theta(x_i)_z.
  3. Aggregate: T(θ,D)=(1/n)i=1nlogsiT(\theta, \mathcal{D}) = (1/n) \sum_{i=1}^n \log s_i.

This process involves only a single forward pass of the target data through the source model, and O(nZ)O(n|\mathcal{Z}|) additional arithmetic for joint distributions (Nguyen et al., 2020, Wong et al., 2022).

3. Theoretical Properties and Bounds

LEEP is theoretically bounded between a single-label assignment baseline and the optimal (oracle) retrained classifier head:

  • Upper bound: If θ=(w,h)\theta = (w, h) with a frozen feature extractor ww and head hh, and kk^* is the maximum-likelihood head on D\mathcal{D} fixing ww,

T(θ,D)l(w,k):=maxk1nilogp(yixi;w,k)T(\theta, \mathcal{D}) \leq l(w, k^*) := \max_{k} \frac{1}{n} \sum_{i} \log p(y_i|x_i; w, k)

  • Lower bound: Let zi=argmaxzθ(xi)zz_i = \arg \max_z \theta(x_i)_z (“hard” label), and define the Negative Conditional Entropy (NCE)

NCE(YZ)=1ni=1nlogP^(yizi)NCE(Y|Z) = -\frac{1}{n} \sum_{i=1}^n \log \hat P(y_i | z_i)

Then,

T(θ,D)NCE(YZ)+1ni=1nlogθ(xi)ziT(\theta, \mathcal{D}) \geq NCE(Y|Z) + \frac{1}{n} \sum_{i=1}^n \log \theta(x_i)_{z_i}

These theoretical results demonstrate that LEEP interpolates between “hard” assignment metrics and the best possible retrained head log-likelihood on the target (Nguyen et al., 2020).

4. Empirical Performance and Correlation with Transfer Success

LEEP performance has been evaluated in settings including large-scale visual transfer (ImageNet → CIFAR-100), small-data and imbalanced regimes, meta-transfer (CNAPs), and RF domain adaptation:

  • Large data: ImageNet→CIFAR-100 head-retraining, Pearson ρ0.974\rho \approx 0.974; CIFAR-10→CIFAR-100, ρ0.982\rho \approx 0.982.
  • Few-shot/imbalanced: $5$ classes ×50\times 50 examples, LEEP correlates positively (ρ0.60.8\rho \approx 0.6-0.8), maintains significance with label noise or class imbalance.
  • Meta-transfer (CNAPs): On 200 random 5-way, 50-shot CIFAR-100 tasks, ρ0.59\rho \approx 0.59.
  • RF domain adaptation: Across SNR/FO-shifted domains, LEEP correlates with head-retrain accuracy (r=0.72r=0.72–$0.82$), high correlation with LogME (r0.85r \approx 0.85–$0.90$), near-optimal source model selection in over 80% of cases (Nguyen et al., 2020, Wong et al., 2022).
  • Convergence: Models in higher-LEEP bins converge faster and surpass scratch-trained accuracy with fewer epochs.

5. Comparative Analysis with Baselines

LEEP has been compared against Negative Conditional Entropy (NCE) [Tran et al., ICCV’19] and H score [Bao et al., ICIP’19]:

  • LEEP matches or exceeds NCE’s Pearson ρ\rho in most settings with up to \sim30% relative improvement (e.g., ρ\rho improves from $0.715$ to $0.798$).
  • H score frequently fails to produce statistically significant correlations (11/23 cases) and never surpasses LEEP on large-data benchmarks.
  • In the representative ImageNet→CIFAR-100 head-retraining setting, LEEP achieves ρ=0.974\rho=0.974, outperforming the H score’s $0.924$ (Nguyen et al., 2020).

6. Practical Guidelines for Application

  • Efficiency: LEEP requires only one forward pass per (source model, target data) pair.
  • Minimal data: Robust to small, imbalanced, or noisy target sets provided there are at least several examples per class to reliably estimate empirical P^(yz)\hat P(y|z).
  • Utilities:
    • Rank source models for transfer (model zoo selection).
    • Screen source-target pairings for joint/multi-task grouping.
    • Guide decisions on fine-tuning necessity and anticipate convergence rates.
  • Domain transferability: LEEP is not symmetric; transfer from hard→easy yields higher scores than the reverse.
  • RF applications: In scenarios such as modulation classification, LEEP aligns with domain proximity (SNR/FO), and can guide rapid source selection without retraining (Nguyen et al., 2020, Wong et al., 2022).

7. Limitations and Considerations

  • Data sparsity: With very few examples per target class (e.g., <5<5), empirical estimates P^(yz)\hat P(y|z) become unreliable, raising LEEP’s variance.
  • Source model dependency: LEEP accuracy presumes a reasonably well-trained source classifier.
  • Feature usage: LEEP operates on softmax outputs; it does not explicitly exploit intermediate feature activations.
  • Architectural effects: It may not capture nuanced behaviors when fine-tuning is highly architecture- or hyperparameter-sensitive.
  • Scope: Requires softmax-based source models and compatible input spaces; does not generalize to non-classification or regression tasks without modification (Nguyen et al., 2020, Wong et al., 2022).

LEEP stands as a theoretically grounded, computationally efficient, and empirically validated metric for assessing source model transferability across numerous domains and regimes. Its design, tight theoretical bounds, and high empirical correlation with actual transfer performance make it a practical decision metric in both research and applied transfer learning settings (Nguyen et al., 2020, Wong et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Log Expected Empirical Prediction (LEEP).