Homologous Latents Fusion in PSSP & Video Restoration

Updated 5 February 2026

Homologous Latents Fusion is a methodology for integrating latent representations from models with matching architectures, enabling enhanced prediction accuracy.
In protein secondary structure prediction, it fuses low-quality evolutionary profiles with BERT-derived pseudo-profiles to reliably predict residue properties in low-homology conditions.
For zero-shot video restoration, it dynamically blends image and video diffusion model latents using an adaptive Chain-of-Thought strategy to maintain spatial detail and temporal consistency.

Homologous Latents Fusion is a class of methodologies for integrating latent representations—derived from models sharing identical or nearly identical architectures—within a unified latent space. The principal variants address critical challenges in both protein secondary structure prediction (PSSP) for low-homology proteins and zero-shot video restoration using diffusion models by leveraging data-driven or model-driven fusions of homologous latent vectors. This entry surveys the two major lines of homologous latents fusion: residue-wise profile fusion for protein sequences (Wang et al., 2021) and framewise latent fusion for temporally consistent video restoration (Cao et al., 29 Jan 2026).

1. Definition and Conceptual Foundation

Homologous Latents Fusion spans domains in which at least two latent representations, mapped from distinct but structurally aligned models (e.g., protein BERT and shallow MSA-constructed profiles, or image and video diffusion models built on a shared VAE), are combined using adaptive or convex weighting. In protein PSSP, it refers to residue-level fusion of weak evolutionary profiles and external knowledge-derived pseudo-profiles. In vision diffusion, it denotes the linear combination of image (IR/IE) and video (T2V) model latents sharing the same VAE latent space, performed synchronously at each step of the diffusion trajectory.

2. Homologous Latents Fusion in Protein Secondary Structure Prediction

For low-homology proteins, evolutionary profiles constructed from MSAs are often unreliable due to small sample sizes. The homologous latents fusion approach ("Adaptive Residue-wise Profile Fusion") (Wang et al., 2021) addresses this by combining:

Low-quality profile ( $P_{\ell}\in\mathbb{R}^{L\times 20}$ ): Derived from shallow MSAs, defined as $P_{\ell}[i,a] = \frac{F[i,a]+\theta}{N+\theta}$ .
BERT-derived pseudo-profile ( $P_b\in\mathbb{R}^{L\times 20}$ ): Obtained by masking and probing a pretrained protein BERT, producing an implicit residue distribution based on global protein sequence knowledge.

An adaptive fusion is performed at each residue $i$ :

$\mathbf{h}_i = \alpha_i^{(1)} \mathbf{z}_i^{(1)} + \alpha_i^{(2)} \mathbf{z}_i^{(2)},\quad \mathbf{z}_i^{(1)} = P_{\ell}[i],\;\mathbf{z}_i^{(2)} = P_{b}[i]$

where weights $(\alpha_i^{(1)},\alpha_i^{(2)})$ are inferred from a grading network conditioned on the accuracy of auxiliary PSSP heads for both channels. Supervision is provided by pseudo-labels computed from per-residue cross-entropy errors, penalizing deviation in log space.

A feature consistency loss $\mathcal{L}_f$ ensures that the fused representation remains semantically aligned with true high-quality profiles by matching BiLSTM features, KL-divergence between predicted softmax distributions, and final PSSP cross-entropy.

This residue-wise mechanism is especially effective for orphan sequences, as the BERT pseudo-profile supplies informative priors while the adaptive fusion weight allocates confidence locally according to evolutionary signal quality.

3. Homologous Latents Fusion in Diffusion-Based Video Restoration

In the context of zero-shot video restoration (Cao et al., 29 Jan 2026), homologous latents fusion capitalizes on the architectural alignment between state-of-the-art image restoration models and video diffusion models such as Zeroscope (SD v1.5-based). Both operate in an identical VAE latent space, enabling direct convex fusion.

At each reverse-diffusion timestep $t$ :

$z_t^{F1} = (1 - \lambda_t^{F1}) \cdot z_t^I + \lambda_t^{F1} \cdot z_t^{V1}$

where $z_t^I$ is the IR model's denoising latent, $z_t^{V1}$ is the homologous T2V model's latent, and $\lambda_t^{F1}\in[0,1]$ is a dynamically selected fusion ratio.

Both models advance with the fused latent, ensuring framewise and temporal consistency. The overall pipeline is training-free and agnostic to the specific IR method.

4. Dynamic Fusion Ratio Selection: Chain-of-Thought (COT) Strategy

A principal challenge in homologous latent fusion is determining the fusion weight $\lambda_t^{F1}$ for optimal trade-off between spatial detail (IR) and temporal smoothness (T2V). The adaptive COT-based search operates as follows:

At each timestep $t$ , candidate weights centered on the previous step's $\lambda_t^c$ are sampled across a small interval.
For each candidate, fused latent $z_t^{F1}(\lambda)$ is decoded to produce a video segment.
Perceptual (CLIP-IQA) and temporal (Warp Error, WE) metrics are used to rank all candidates; the sum of their ranks determines the optimal $\lambda_t^{F1}$ .
This process maintains the stability of spatial details while substantially suppressing flicker.

This strategy replaces heuristic fusion with metric-driven adaptive mixing and is extensible to any diffusion-based method leveraging a shared latent space.

5. Empirical Performance and Ablation Results

On the BC40 set, extremely low-homology (MSA count $=0$ ):

Method	PSSP Accuracy (%)
Low-quality profile	68.2
Bagging (SOTA)	70.8
Fusion + consistency	75.5

The fusion model shows a $4.7$ p.p. gain over Bagging and $7.3$ p.p. over the raw profile. Improvements persist for $MSA$ counts $<10$ , $<30$ , and $<60$ , and on other benchmarks.

Ablation on 4× blind video SR, DAVIS benchmark (DiffBIR backbone):

Configuration	HMLF	COT	WE↓	t-LPIPS↓	PSNR↑	SSIM↑
Baseline	✗	✗	0.806	3.92	26.50	0.6869
+ HMLF only	✔	✗	0.696	3.36	26.69	0.6981
+ All modules	✔	✔	0.376	0.41	27.42	0.7388

On zero-shot 4× SR with the PSLD backbone, HMLF reduces WE from $0.8408$ to $\sim$ 0.65, and t-LPIPS from $6.28$ to $\sim$ 3.8. With the full pipeline (including dynamic COT), WE drops further to $0.236$ and t-LPIPS to $0.62$, while PSNR and SSIM are preserved or improved.

6. Comparative Context and Distinction from Heterogenous Latent Fusion

Homologous latents fusion is fundamentally distinct from heterogenous latents fusion, which addresses cases where latent spaces cannot be directly blended due to architectural discrepancies (e.g., 3D autoencoders in advanced T2V models versus 2D image VAEs). In such cases, latent fusion requires decoding and re-encoding via a compatible VAE before fusion, at significant computational cost and with additional potential for information loss. In contrast, homologous latents fusion leverages structural concordance for computationally efficient, information-preserving fusion relevant to both protein informatics and video restoration pipelines.

7. Significance and Future Directions

Homologous latents fusion provides a mathematically principled, empirically validated approach for integrating complementary modalities or priors when models share an aligned latent space. In protein structure prediction, it enables accurate inference in the low-homology regime by dynamically exploiting global and local sequence information. In diffusion video restoration, it forms the backbone of state-of-the-art, training-free pipelines with substantially reduced temporal flicker and enhanced spatial detail. A plausible implication is a broader applicability to multimodal and transfer learning settings where model architectures can be aligned at the latent level, potentially yielding new paradigms in knowledge fusion and cross-domain adaptation (Wang et al., 2021, Cao et al., 29 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Adaptive Residue-wise Profile Fusion for Low Homologous Protein SecondaryStructure Prediction Using External Knowledge (2021)

Zero-Shot Video Restoration and Enhancement with Assistance of Video Diffusion Models (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Homologous Latents Fusion.

Homologous Latents Fusion in PSSP & Video Restoration

1. Definition and Conceptual Foundation

2. Homologous Latents Fusion in Protein Secondary Structure Prediction

3. Homologous Latents Fusion in Diffusion-Based Video Restoration

4. Dynamic Fusion Ratio Selection: Chain-of-Thought (COT) Strategy

5. Empirical Performance and Ablation Results

Protein PSSP (Wang et al., 2021):

Video Restoration (Cao et al., 29 Jan 2026):

6. Comparative Context and Distinction from Heterogenous Latent Fusion

7. Significance and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Homologous Latents Fusion in PSSP & Video Restoration

1. Definition and Conceptual Foundation

2. Homologous Latents Fusion in Protein Secondary Structure Prediction

3. Homologous Latents Fusion in Diffusion-Based Video Restoration

4. Dynamic Fusion Ratio Selection: Chain-of-Thought (COT) Strategy

5. Empirical Performance and Ablation Results

Protein PSSP (Wang et al., 2021):

Video Restoration (Cao et al., 29 Jan 2026):

6. Comparative Context and Distinction from Heterogenous Latent Fusion

7. Significance and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics