HyperHELM: Hyperbolic mRNA Modeling
- HyperHELM is a masked language modeling framework that leverages hyperbolic geometry to explicitly encode the codon-to-amino acid hierarchy.
- It integrates a 10-layer Euclidean transformer with a hyperbolic head that projects outputs into the Poincaré ball, inducing hierarchical representations.
- Empirical evaluations demonstrate HyperHELM's superior performance on property prediction and annotation tasks, especially under out-of-distribution conditions.
HyperHELM is a framework for masked language modeling of mRNA sequences that explicitly utilizes hyperbolic geometry to encode biological hierarchies, specifically the codon-to-amino-acid relationship. HyperHELM integrates a standard Euclidean transformer backbone with a lightweight hyperbolic head, enabling hierarchical structure induction through hyperbolic representation and direct masked language prediction in the Poincaré ball. This approach outperforms several Euclidean and hierarchy-agnostic baselines across property prediction and annotation tasks, demonstrating superior out-of-distribution generalization and improved downstream performance in settings where biological encoding hierarchies are prominent (Spengler et al., 29 Sep 2025).
1. Mathematical Framework: The Poincaré Ball Model
HyperHELM operates in the -dimensional Poincaré ball of constant negative curvature , formalized as: Core operations include:
- Hyperbolic Distance:
- Möbius Addition:
- Exponential and Logarithmic Maps at the Origin:
These tools enable lifting representations from Euclidean to hyperbolic space and facilitate hyperbolic analogues of neural operators.
2. Model Architecture and Hierarchy Encoding
HyperHELM employs a hybrid architecture:
- Backbone: A 10-layer transformer with standard Euclidean geometry, hidden size 640 (∼50 M parameters).
- Hyperbolic Head: The Euclidean outputs are projected into using the exponential map, then transformed via a hyperbolic linear layer:
Prototypes representing codon leaf nodes are precomputed by embedding the codon–amino-acid tree into with the distortion-minimizing HS-DTE algorithm. This ensures that codons under the same amino acid map close together in hyperbolic space, while different branches are exponentially separated:
3. Objective Function in Hyperbolic Space
Masked language modeling masks 15% of codon tokens, predicting them via a prototype classifier in . The masked positions and true tokens at position are predicted using a softmax over similarity scores : Two choices of similarity function are evaluated:
- Negative Hyperbolic Distance:
- Entailment-Cone Energy:
with as the cone half-aperture and the axis angle.
The pretraining loss is cross-entropy: An additional hierarchical cross-entropy (HXE) regularizer (weight ) promotes consistency with ancestor hierarchy.
4. Training Strategy and Implementation
- Corpus: Observed Antibody Sequences (OAS), with codon-level tokenization (vocabulary: 64+special).
- Hardware: 1024 sequences/batch, 8×A100 GPUs, 40 epochs.
- Architecture parameters: Max input length 444 codons, up to 2048 positions.
- Optimization: AdamW, warmup to LR , cosine decay to , weight decay .
- Hyperbolic Head: prototype dimension , curvature , temperature .
- Regularization: HXE with .
- Fine-tuning/Probing: Fixed backbone with TextCNN (100 channels), varying LR and batch.
5. Empirical Evaluation and Quantitative Results
HyperHELM was assessed on ten property-prediction tasks and an antibody region annotation task, spanning datasets such as Ab1, Ab2, mRFP, COVID-19 Vaccine, Drosophila, S. cerevisiae, P. pastoris, Fungal, E. coli, iCodon, and a held-out antibody annotation set.
Comparison baselines included Transformer XE, RNA-FM, SpliceBERT, CodonBERT, and Euclidean HELM. Metrics were Spearman (regression) and accuracy (classification/annotation).
Selected results:
- On 9/10 property prediction tasks, HyperHELM surpassed HELM by ∼10% on average.
- Substantial improvement in D. melanogaster (ρ=0.450 vs. 0.341, +32%) and E. coli (accuracy=50.8% vs. 45.8%, +10.9%).
- On the antibody region annotation task, HyperHELM(dist) achieved 76.48% accuracy vs. HELM's 73.48% (+3%).
| Task / Dataset | Metric & Value (HyperHELM) | Relative Gain vs. HELM |
|---|---|---|
| D. melanogaster | ρ = 0.450 | +32% |
| E. coli | acc = 50.8% | +10.9% |
| iCodon | ρ = 0.539 | +2.7% |
| Ab1 | ρ = 0.751 | +5.1% |
| Antibody Annotation | acc = 76.48% | +3% |
6. Out-of-Distribution Generalization
HyperHELM exhibited improved generalization under resource-scarce or compositionally shifted conditions:
- Sequence Length (P. pastoris): For long sequences (2000–3000 nt), accuracy rose to 0.70 (vs. HELM 0.46).
- GC Content (COVID-19): For high GC content (>55%), 0.62 vs. HELM's 0.56; for medium (47–55%), 0.73 vs. 0.64.
- Codon Usage Bias: Datasets with stronger bias (lower ENC) saw larger gains, indicating alignment between hyperbolic inductive bias and tree-like codon–amino-acid structure.
A plausible implication is that low-data and distributionally biased regimes particularly benefit from the model's ability to encode hierarchical structure in hyperbolic space.
7. Significance and Alignment with Biological Structure
The empirical results confirm that the hyperbolic head, via codon hierarchy–aligned prototypes, provides an effective inductive bias for both in-distribution and out-of-distribution scenarios. HyperHELM’s architecture is compatible with modern hardware (due to Euclidean backbone) while achieving a more faithful representation of the biological hierarchy compared to Euclidean models. This suggests broader relevance for modeling biological sequences whose underlying generating process is hierarchical (Spengler et al., 29 Sep 2025).