DSRPGO: Multimodal Protein Function Prediction

Updated 10 November 2025

The paper introduces DSRPGO, a dual-stage deep learning framework that integrates spatial and sequence modalities for improved multi-label protein function prediction.
The method employs reconstructive pre-training, bidirectional attention, and dynamic selection to robustly fuse heterogeneous protein data from various sources.
Evaluation on GO ontologies shows significant gains over baselines, validating DSRPGO’s potential to advance functional genomics and proteomics.

The Multimodal Protein Function Prediction Method known as DSRPGO (Dynamic Selection and Reconstructive Pre-training with Genetic Optimization) is a dual-stage, dual-branch deep learning framework designed to address the complex challenge of protein function prediction using heterogeneous biological data modalities. DSRPGO integrates spatial, sequence, and functional protein data through reconstructive pre-training, cross-modal bidirectional attention, and an adaptive dynamic selection mechanism. This approach achieves state-of-the-art results on hierarchical, multi-label classification of protein function, notably on Gene Ontology (GO) ontologies—Biological Process (BPO), Molecular Function (MFO), and Cellular Component (CCO)—modernizing the predictive pipeline for functional genomics and proteomics (Luo et al., 6 Nov 2025).

1. Model Architecture and Modalities

DSRPGO uses a two-stage pipeline. In the first (pre-training) stage, separate encoder–decoder pairs process (a) protein spatial structural information (PSSI), encoding both protein–protein interaction (PPI) networks and bag-of-words features for subcellular localization and domains, and (b) protein sequence information (PSeI), using pretrained token embeddings from ProtT5. Each branch learns a fine-grained, low-semantic feature space via reconstruction losses. In the second (fine-tuning) stage, these encoders initialize a dual-branch classification system with a Multimodal Shared Learning (MSL) branch that aggregates all modalities, and a Multimodal Interactive Learning (MIL) branch containing the Bidirectional Interaction Module (BInM) for explicit cross-modal attention between sequence and spatial features.

Each branch produces three channel outputs—PPI, attribute, and sequence feature vectors—totaling six feature vectors per protein for downstream fusion. This dual-branch design enables both global integration (through shared learning) and detailed, bidirectional cross-modal exchange.

2. Reconstructive Pre-training and Encoders

Reconstructive pre-training is central to DSRPGO. The goal is to learn encoders that extract more informative and fine-grained features from both spatial and sequence modalities:

a) PSSI encoder–decoder:

Input: $x_i^{h(k)} \in \mathbb{R}^{H_i^k}$ , with $k=1$ for flattened PPI adjacency matrix, $k=2$ for concatenated bag-of-words localization and domain features.
Internals: Uses BiMamba blocks built from state-space models (SSM) and selective scanning, combining forward and backward scan layers, linear mapping, and FiLM-like gating.
Output: Encoded vectors $x_i^{d(k)}$ , decoded to reconstruct the input via a mirrored decoder.
Loss: Binary cross-entropy on reconstruction,

$\mathcal{L}_{sp} = \frac{1}{N} \sum_{i=1}^N \sum_{k=1}^K \sum_{j=1}^{H_i^k} [-x_{ij}^{h(k)}\log \bar x_{ij}^{h(k)} - (1-x_{ij}^{h(k)})\log(1-\bar x_{ij}^{h(k)}) ]$

b) PSeI encoder–decoder:

Input: ProtT5-embedded sequence $s_i^h$ .
Architecture: MLP followed by six-layer Transformer encoder, symmetric decoder.
Loss: Binary cross-entropy,

$\mathcal{L}_{se} = \frac{1}{N} \sum_{i=1}^N \sum_{j=1}^{H_i} [-s_{ij}^h\log \bar s_{ij}^h - (1-s_{ij}^h)\log(1-\bar s_{ij}^h) ]$

This reconstructive formulation ensures both modalities are deeply encoded before fine-tuning on functional labels.

3. Bidirectional Interaction and Dynamic Selection

Bidirectional Interaction Module (BInM):

Enables cross-attention between spatial and sequence representations in the MIL-Branch. After initial MLP mapping and splitting into multi-head attention space, BInM realizes bidirectional, pairwise attention, projecting attended outputs back to their initial feature spaces. For input sets $\widetilde x_i^B$ (PPI + attribute) and $\overline x_i^B$ (sequence), BInM models both directions: $F_c^1 = \text{softmax}(Q^1(F_b^1)K^2(F_b^2)^\top)V^2(F_b^2), \quad F_c^2 = \text{softmax}(Q^2(F_b^2)K^1(F_b^1)^\top)V^1(F_b^1)$

Dynamic Selection Module (DSM):

Fuses the six output channels (three from MSL, three from MIL) using adaptive weighting. For each protein, DSM computes “expert confidence” for each channel by MLP + softmax: $\hat p = \mathrm{softmax}(\mathrm{MLP}(X_{dsm}))$ Channels with confidence above a threshold $t$ are selected, renormalized, and passed through “expert” layers. The concatenated output passes to the classifier, providing flexibility and adaptivity to the functional prediction process.

4. Hierarchical Multi-Label Classification and Losses

DSRPGO treats each GO ontology (BPO, MFO, CCO) as an independent multi-label classification task. The classifier is trained on the DSM-fused features with a focal-style asymmetric cross-entropy loss to accommodate class imbalances: $\mathcal L = \frac{1}{N M} \sum_{i=1}^N \sum_{m=1}^M \Bigl[ -y_i^m (1-p_i^m)^{\gamma^+} \log p_i^m - (1-y_i^m) (p_i^m)^{\gamma^-} \log(1-p_i^m) \Bigr]$ There is no explicit regularization for hierarchy-awareness; each ontology is handled independently.

5. Training, Implementation, and Hyperparameters

Data Preparation

Pretraining: 19,385 human proteins (PPI from STRING v11.5; sequence, localization, domain from UniProt v3.5.175)
Fine-tuning: Split by timestamp into train/val/test for each ontology (e.g., BPO: 3,197/304/182)

Hyperparameters

Pre-training: AdamW, 5,000 epochs, learning rate $1 \times 10^{-5}$ then $1 \times 10^{-6}$ , dropout 0.1
Fine-tuning: AdamW, 100 epochs, learning rate $1 \times 10^{-3}$ then $1 \times 10^{-4}$ , dropout 0.3
Hardware: NVIDIA RTX 4090 or better (≥16 GB VRAM)
Training time: ~24–48 hours (pre-training), ~1–3 hours per ontology (fine-tuning)

Implementation Notes

Frozen ProtT5 for sequence, random (Xavier) init for MLPs, BiMamba, Transformers
The full process is encapsulated in reproducible pseudocode for pre-training, fine-tuning, and inference as described in (Luo et al., 6 Nov 2025).

6. Evaluation and Comparative Performance

Performance is assessed on held-out test sets for each GO ontology using F $_{\max}$ , micro/macro AUPR, and accuracy:

Ontology	CFAGO (best baseline)	DSRPGO
BPO F $_{\max}$	0.439 ± 0.007	0.458 ± 0.006
MFO F $_{\max}$	0.236 ± 0.004	0.254 ± 0.022
CCO F $_{\max}$	0.366 ± 0.018	0.452 ± 0.019

Ablation studies reveal that omitting reconstructive pre-training, the bidirectional interaction module, or the dynamic selection module reduces F $_{\max}$ substantially; for example, "no pre-training" scores 0.297/0.167/0.356 (BPO/MFO/CCO), and BInM/DSM removal yields drops of up to 0.07 on MFO and CCO. Sequence-only or spatial-only baselines underperform dual-modality models.

Standard deviations across five runs attest to stability; paired t-tests are not reported, but all gains demonstrate strong robustness.

7. Significance, Context, and Implications

DSRPGO establishes a new paradigm for multimodal protein function prediction by coupling the strengths of reconstructive pre-training, dynamic cross-modal feature interaction, and adaptive channel selection. By explicitly disentangling modality-specific representations and enabling selective fusion, DSRPGO achieves consistent improvement over strong multimodal and unimodal baselines in hierarchical GO label prediction.

This methodology is applicable particularly to eukaryotic function prediction tasks where context-rich, heterogeneous protein attributes are available. The reconstructive pre-training strategy anchors the framework’s generalizability, while DSM ensures context-aware inference. Comparison with generative LLM models for protein QA (Xiao et al., 2024) and contrastive/modal alignment paradigms (Wang et al., 2024) highlights DSRPGO’s focus on direct discriminative function label prediction under complex multi-label, multi-ontology settings.

A plausible implication is that further extensions—such as joint optimization with domain-embedding or family-aware graph modalities—may yield additional functional gains, particularly in settings with extreme class imbalance or limited experimental annotations. The modular pipeline and explicit channel weighting in DSRPGO provide routes for continual learning, hard-negative mining, and integration with knowledge graph-enhanced ontological priors.

PDF Markdown Chat (Pro)

References (3)

Enhancing Multimodal Protein Function Prediction Through Dual-Branch Dynamic Selection with Reconstructive Pre-Training (2025)

ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding (2024)

ProtFAD: Introducing function-aware domains as implicit modality towards protein function prediction (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Multimodal Protein Function Prediction Method (DSRPGO).

DSRPGO: Multimodal Protein Function Prediction

1. Model Architecture and Modalities

2. Reconstructive Pre-training and Encoders

3. Bidirectional Interaction and Dynamic Selection

4. Hierarchical Multi-Label Classification and Losses

5. Training, Implementation, and Hyperparameters

Data Preparation

Hyperparameters

Implementation Notes

6. Evaluation and Comparative Performance

7. Significance, Context, and Implications

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

DSRPGO: Multimodal Protein Function Prediction

1. Model Architecture and Modalities

2. Reconstructive Pre-training and Encoders

3. Bidirectional Interaction and Dynamic Selection

4. Hierarchical Multi-Label Classification and Losses

5. Training, Implementation, and Hyperparameters

Data Preparation

Hyperparameters

Implementation Notes

6. Evaluation and Comparative Performance

7. Significance, Context, and Implications

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research