Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 177 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 119 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Mix-Geneformer: Unified Representation Learning for Human and Mouse scRNA-seq Data (2507.07454v1)

Published 10 Jul 2025 in q-bio.GN

Abstract: Single-cell RNA sequencing (scRNA-seq) enables single-cell transcriptomic profiling, revealing cellular heterogeneity and rare populations. Recent deep learning models like Geneformer and Mouse-Geneformer perform well on tasks such as cell-type classification and in silico perturbation. However, their species-specific design limits cross-species generalization and translational applications, which are crucial for advancing translational research and drug discovery. We present Mix-Geneformer, a novel Transformer-based model that integrates human and mouse scRNA-seq data into a unified representation via a hybrid self-supervised approach combining Masked LLMing (MLM) and SimCSE-based contrastive loss to capture both shared and species-specific gene patterns. A rank-value encoding scheme further emphasizes high-variance gene signals during training. Trained on about 50 million cells from diverse human and mouse organs, Mix-Geneformer matched or outperformed state-of-the-art baselines in cell-type classification and in silico perturbation tasks, achieving 95.8% accuracy on mouse kidney data versus 94.9% from the best existing model. It also successfully identified key regulatory genes validated by in vivo studies. By enabling scalable cross-species transcriptomic modeling, Mix-Geneformer offers a powerful tool for comparative transcriptomics and translational applications. While our results demonstrate strong performance, we also acknowledge limitations, such as the computational cost and variability in zero-shot transfer.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.