HyperHELM: Hyperbolic mRNA Modeling

Updated 20 January 2026

HyperHELM is a masked language modeling framework that leverages hyperbolic geometry to explicitly encode the codon-to-amino acid hierarchy.
It integrates a 10-layer Euclidean transformer with a hyperbolic head that projects outputs into the Poincaré ball, inducing hierarchical representations.
Empirical evaluations demonstrate HyperHELM's superior performance on property prediction and annotation tasks, especially under out-of-distribution conditions.

HyperHELM is a framework for masked language modeling of mRNA sequences that explicitly utilizes hyperbolic geometry to encode biological hierarchies, specifically the codon-to-amino-acid relationship. HyperHELM integrates a standard Euclidean transformer backbone with a lightweight hyperbolic head, enabling hierarchical structure induction through hyperbolic representation and direct masked language prediction in the Poincaré ball. This approach outperforms several Euclidean and hierarchy-agnostic baselines across property prediction and annotation tasks, demonstrating superior out-of-distribution generalization and improved downstream performance in settings where biological encoding hierarchies are prominent (Spengler et al., 29 Sep 2025).

1. Mathematical Framework: The Poincaré Ball Model

HyperHELM operates in the $n$ -dimensional Poincaré ball $\mathbb{D}_c^n$ of constant negative curvature $-c$ , formalized as: $\mathbb{D}_c^n = \{x \in \mathbb{R}^n : \|x\|^2 < 1/c\}, \quad \mathfrak{g}_c^n(x) = \lambda_x^c I_n, \quad \lambda_x^c = \frac{2}{1 - c\|x\|^2}$ Core operations include:

Hyperbolic Distance:

$d^c(x, y) = \frac{1}{\sqrt{c}} \cosh^{-1} \left(1 + 2c \frac{\|x-y\|^2}{(1 - c\|x\|^2)(1 - c\|y\|^2)}\right)$

Möbius Addition:

$x \oplus_c y = \frac{(1 + 2c\langle x, y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y}{1 + 2c\langle x, y\rangle + c^2\|x\|^2\|y\|^2}$

Exponential and Logarithmic Maps at the Origin:

$\exp^c_{\mathbf{0}}(v) = \mathbf{0} \oplus_c \left(\tanh\left(\frac{\sqrt{c} \lambda^c_{\mathbf{0}}\|v\|}{2}\right) \frac{v}{\sqrt{c}\|v\|}\right); \quad \lambda^c_{\mathbf{0}}=2$

$\log^c_{\mathbf{0}}(x) = \frac{2}{\sqrt{c} \lambda^c_{\mathbf{0}}} \tanh^{-1}(\sqrt{c}\|x\|) \frac{x}{\|x\|}$

These tools enable lifting representations from Euclidean to hyperbolic space and facilitate hyperbolic analogues of neural operators.

2. Model Architecture and Hierarchy Encoding

HyperHELM employs a hybrid architecture:

Backbone: A 10-layer transformer with standard Euclidean geometry, hidden size 640 (∼50 M parameters).
Hyperbolic Head: The Euclidean outputs are projected into $\mathbb{D}_c^n$ using the exponential map, then transformed via a hyperbolic linear layer:

$\mathrm{HypLin}^c(x) = \exp^c_{\mathbf{0}}(W \log^c_{\mathbf{0}}(x) + b)$

Prototypes $\phi(u)$ representing codon leaf nodes are precomputed by embedding the codon–amino-acid tree $T = (V, E)$ into $\mathbb{D}_c^n$ with the distortion-minimizing HS-DTE algorithm. This ensures that codons under the same amino acid map close together in hyperbolic space, while different branches are exponentially separated: $d^c(\phi(u), \phi(v)) \approx d_T(u, v) \quad \forall u,v\in V$

3. Objective Function in Hyperbolic Space

Masked language modeling masks 15% of codon tokens, predicting them via a prototype classifier in $\mathbb{D}_c^n$ . The masked positions $M$ and true tokens $t_i$ at position $i$ are predicted using a softmax over similarity scores $s(z_i, \phi(u))$ : $p(t_i = u \mid x) = \frac{\exp(\beta s(z_i, \phi(u)))}{\sum_v \exp(\beta s(z_i, \phi(v)))}$ Two choices of similarity function $s$ are evaluated:

Negative Hyperbolic Distance: $s_{\mathrm{dist}}(x, y) = -d^c(x, y)$
Entailment-Cone Energy:

$E(x, y) = \max \left\{0, \ \Xi(x, y) - \eta \psi(x)\right\}$

with $\psi(x)$ as the cone half-aperture and $\Xi(x, y)$ the axis angle.

The pretraining loss is cross-entropy: $\mathcal{L}_{\mathrm{MLM}} = -\frac{1}{|M|} \sum_{i\in M} \log p(t_i \mid x)$ An additional hierarchical cross-entropy (HXE) regularizer (weight $\alpha=0.2$ ) promotes consistency with ancestor hierarchy.

4. Training Strategy and Implementation

Corpus: Observed Antibody Sequences (OAS), with codon-level tokenization (vocabulary: 64+special).
Hardware: 1024 sequences/batch, 8×A100 GPUs, 40 epochs.
Architecture parameters: Max input length 444 codons, up to 2048 positions.
Optimization: AdamW, warmup to LR $1 \times 10^{-4}$ , cosine decay to $1 \times 10^{-5}$ , weight decay $1 \times 10^{-1}$ .
Hyperbolic Head: prototype dimension $n_p=128$ , curvature $c=1.0$ , temperature $\beta=10$ .
Regularization: HXE with $\alpha=0.2$ .
Fine-tuning/Probing: Fixed backbone with TextCNN (100 channels), varying LR and batch.

5. Empirical Evaluation and Quantitative Results

HyperHELM was assessed on ten property-prediction tasks and an antibody region annotation task, spanning datasets such as Ab1, Ab2, mRFP, COVID-19 Vaccine, Drosophila, S. cerevisiae, P. pastoris, Fungal, E. coli, iCodon, and a held-out antibody annotation set.

Comparison baselines included Transformer XE, RNA-FM, SpliceBERT, CodonBERT, and Euclidean HELM. Metrics were Spearman $\rho$ (regression) and accuracy (classification/annotation).

Selected results:

On 9/10 property prediction tasks, HyperHELM surpassed HELM by ∼10% on average.
Substantial improvement in D. melanogaster (ρ=0.450 vs. 0.341, +32%) and E. coli (accuracy=50.8% vs. 45.8%, +10.9%).
On the antibody region annotation task, HyperHELM(dist) achieved 76.48% accuracy vs. HELM's 73.48% (+3%).

Task / Dataset	Metric & Value (HyperHELM)	Relative Gain vs. HELM
D. melanogaster	ρ = 0.450	+32%
E. coli	acc = 50.8%	+10.9%
iCodon	ρ = 0.539	+2.7%
Ab1	ρ = 0.751	+5.1%
Antibody Annotation	acc = 76.48%	+3%

6. Out-of-Distribution Generalization

HyperHELM exhibited improved generalization under resource-scarce or compositionally shifted conditions:

Sequence Length (P. pastoris): For long sequences (2000–3000 nt), accuracy rose to 0.70 (vs. HELM 0.46).
GC Content (COVID-19): For high GC content (>55%), 0.62 vs. HELM's 0.56; for medium (47–55%), 0.73 vs. 0.64.
Codon Usage Bias: Datasets with stronger bias (lower ENC) saw larger gains, indicating alignment between hyperbolic inductive bias and tree-like codon–amino-acid structure.

A plausible implication is that low-data and distributionally biased regimes particularly benefit from the model's ability to encode hierarchical structure in hyperbolic space.

7. Significance and Alignment with Biological Structure

The empirical results confirm that the hyperbolic head, via codon hierarchy–aligned prototypes, provides an effective inductive bias for both in-distribution and out-of-distribution scenarios. HyperHELM’s architecture is compatible with modern hardware (due to Euclidean backbone) while achieving a more faithful representation of the biological hierarchy compared to Euclidean models. This suggests broader relevance for modeling biological sequences whose underlying generating process is hierarchical (Spengler et al., 29 Sep 2025).

Markdown Report Issue Upgrade to Chat

References (1)

HyperHELM: Hyperbolic Hierarchy Encoding for mRNA Language Modeling (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HyperHELM.