Characterize the organization of biological features within RNA LM embeddings

Characterize how the embeddings produced by RNA language models internally organize biological features of RNA sequences.

Background

The authors point out that, although RNA LLMs like RiNALMo support diverse downstream tasks, the way their embedding spaces structure biological information has not been clarified. This gap motivates representation-level interpretability beyond output attributions.

SAE-RNA aims to discover sparse, interpretable features from embeddings and relate them to biological structures and motifs, providing initial steps toward understanding the internal organization of biological features in RNA LMs.

References

RNA LLMs (e.g., RiNALMo) show promise for downstream tasks, yet it is unclear how their embeddings internally organize biological features.

SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations (2510.02734 - Kim et al., 3 Oct 2025) in Section 2.3 (RNA Language Models)