OFS Pseudo-perplexity for Protein Fitness

Updated 15 December 2025

The paper introduces OFS pseudo-perplexity, an efficient one-pass approximation for protein sequence fitness evaluation.
It leverages regression from unmasked residue embeddings to masked token distributions, reducing computational cost compared to traditional methods.
Empirical results demonstrate high correlation with true pseudo-perplexity and success in applications like sequence design and ancestral protein stability analysis.

One Fell Swoop (OFS) pseudo-perplexity is an efficient approximation of masked LLM (MLM) pseudo-perplexity for protein sequence fitness estimation. It leverages a regression from unmasked residue embeddings to masked token distributions, enabling quantification of model uncertainty in a single forward pass. OFS pseudo-perplexity retains the predictive power of true pseudo-perplexity at a fraction of the computational cost and facilitates high-throughput scoring, rapid sequence sampling, and in-depth analysis of sequence design and stability (Kantroo et al., 2024).

1. Pseudo-perplexity in Masked LLMs

Pseudo-perplexity in the context of protein LLMs utilizing MLM is defined to quantify the model’s uncertainty regarding a sequence. Given a protein sequence $x = (x_1, ..., x_n)$ of length $n$ over a vocabulary $V$ (the 20 amino acids plus special tokens), the model is trained to predict masked residues from surrounding context. For each position $i$ , the pseudo-log-likelihood is

$L(x) = \sum_{i=1}^n \log P(x_i \mid x_{-i})$

where $x_{-i}$ is $x$ with the $i$ -th position replaced by a [MASK] token. From $L(x)$ , pseudo-perplexity is defined as

$\text{PPPL}(x) = \exp \left( -\frac{1}{n} L(x) \right )$

Lower PPPL values indicate sequences the model considers more “natural” and, by consequence, of higher predicted fitness. The standard calculation of $L(x)$ requires $n$ independent forward passes, rendering true pseudo-perplexity computationally expensive for long sequences (Kantroo et al., 2024).

2. One Fell Swoop Methodology

OFS pseudo-perplexity replaces the $O(n)$ computational paradigm with a single forward pass and a lightweight, position-wise projection:

Embedding Extraction: The unmasked input sequence $x$ is encoded with a pretrained LLM (e.g., ESM2) to yield $E = \mathrm{Encoder}(x) \in \mathbb{R}^{n \times d}$ , with each row $E_i$ representing the embedding of $x_i$ .
Projection to Masked Profiles: An ensemble of multilayer perceptrons (MLPs), each equipped with ReLU activations and layer normalization, is trained to map $E_i$ to vector $\hat{z}_i \in \mathbb{R}^V$ . The corresponding masked distribution is $\hat{P}(\cdot\,|\, x_{-i}) = \mathrm{softmax}(\hat{z}_i)$ .
Training Objective: The projection network $f$ is trained by minimizing the cross-entropy between $\hat{P}(\cdot\,|\, x_{-i})$ and the ground-truth masked distribution $P(\cdot\,|\, x_{-i})$ derived via one-at-a-time single masking using a large protein sequence corpus (Kantroo et al., 2024).

The OFS pseudo-log-likelihood is then

$\hat{L}_\mathrm{OFS}(x) = \sum_{i=1}^n \log \hat{P}(x_i | x_{-i})$

and the OFS pseudo-perplexity

$\text{PPPL}_\mathrm{OFS}(x) = \exp \left( -\frac{1}{n} \hat{L}_\mathrm{OFS}(x) \right)$

3. Architecture and Computational Considerations

Encoder Model: ESM2, a 650M parameter transformer, remains unchanged.
Projection Ensemble: The projection comprises eight identical MLPs, each with two hidden layers, ReLU activations, and layer normalization, mapping $d = 1280$ to $V = 20$ logits. Final prediction is the average softmax over ensemble members.
Computational Complexity: True pseudo-perplexity requires $n$ encoder passes, total cost $\sim n \cdot T_{\mathrm{enc}}$ . OFS requires one encoder pass ( $T_{\mathrm{enc}}$ ), plus one projection pass ( $T_{\mathrm{proj}} \sim$ constant $\times n$ ), yielding total cost $\sim T_{\mathrm{enc}} + T_{\mathrm{proj}} \approx O(1)$ compared to $n \cdot T_{\mathrm{enc}}$ . For typical $n \sim 300$ , OFS is approximately 300-fold faster at inference (Kantroo et al., 2024).

4. Empirical Performance and Benchmarking

Substitutions Benchmark

Performance: On 61 substitution deep mutational scanning assays, true PPPL achieves aggregate Spearman $\rho = 0.57$ , OFS PPPL achieves $\rho = 0.55$ , and the masked-marginal (MM) heuristic for ESM2 attains $\rho = 0.430$ (average, aggregate mean).
Comparisons: OFS is slightly weaker than MM on pure substitutions but remains competitive with peer sequence fitness models [(Kantroo et al., 2024), Table 1].

Indels Benchmark

OFS enables ESM2 to score insertions/deletions directly. On four indels assays, OFS PPPL achieves aggregate mean Spearman $\rho = 0.574$ , defining a new state of the art on ProteinGym for sequence-only models (Table 2, Figure 2C).

Approximation Fidelity

Profile Entry Accuracy: Mean absolute error for individual profile entries is less than 0.02.
Correlation: The correlation between $\hat{L}_\mathrm{OFS}(x)$ and true $L(x)$ exceeds 0.98 on held-out clusters.
Fitness Prediction Drop: Fitness prediction incurs less than a 5% drop, confirming high-fidelity regression from embeddings to masked distributions.

5. Applications

Monte Carlo Exploration of Sequence Space

OFS PPPL serves as an effective energy function $E(x) = -\log PPPL_\mathrm{OFS}(x)$ in Metropolis–Hastings MCMC sampling. Mutation candidates $u \to x'$ are proposed from the OFS profile, and acceptance probability is

$A = \min \left \{ 1, \exp\left( -\frac{E(x') - E(x)}{T} \right) \frac{q(v | x')}{q(u|x)} \right \}$

This protocol enables rapid generation of diverse, high-confidence protein variants (validity verified using pLDDT scores) in minutes (Figure 2 in (Kantroo et al., 2024)).

Fitness Estimation and Ancestral Stability

OFS PPPL enables direct ranking of extant and reconstructed ancestral proteins. Among 257 Pfam families, 79.8% exhibited lower PPPL (higher predicted stability) for ancestral reconstructions versus extant sequences (Cliff’s $\delta > +0.33$ ), recapitulating the "old-is-gold" effect in ancestral sequence stability (Figure 3).

6. Limitations and Considerations

In-context Repetition: OFS sometimes induces low-diversity, repetitive patterns over long MCMC chains due to repetition artifacts.
Encoder Training Bias: Phylogenetic biases in the encoder’s training data can skew sequence space exploration toward over-represented clades.
Systematic Errors: The projection network inherits any systematic errors of the base encoder.
Approximate Nature: OFS remains a predictive approximation; while Spearman correlation is high, minor performance drops relative to true PPPL persist in certain benchmarks.

7. Summary and Significance

One Fell Swoop pseudo-perplexity transforms the computationally intensive MLM pseudo-perplexity calculation into an efficient single-step procedure by regressing from unmasked representations to masked residue distributions. It preserves nearly all predictive power of the original metric, enables scoring for sequence variants with indels, and underlies efficient sequence design routines such as MCMC exploration and ancestral sequence analysis. These properties establish OFS as a practical tool for protein fitness estimation and generative modeling in computational biology (Kantroo et al., 2024).

PDF Markdown Chat (Pro)

References (1)

Pseudo-perplexity in One Fell Swoop for Protein Fitness Estimation (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to One Fell Swoop (OFS) Pseudo-perplexity.