Dual-Branch Diffusion Framework

Updated 26 September 2025

The framework explicitly decouples clean signal and artifact modeling using dual branches in a denoising diffusion process.
It leverages branch-specific conditioning and joint posterior sampling to enhance reconstruction fidelity and temporal consistency.
Empirical benchmarks, particularly in EEG denoising, demonstrate high correlation and reduced RRMSE, affirming its robust performance.

A dual-branch diffusion framework, within the context of denoising diffusion models, refers to any architectural or algorithmic strategy in which two distinct processing “branches” are instantiated—either for separate data modalities, separate source and target constraints, or the explicit modeling of signal and artifact (or noise)—with their outputs either fused or used in tandem to accomplish superior generative, discriminative, or denoising performance. This dual-branch methodology is characterized by decoupling aspects of the generative or analytical process, such as separating foreground and background conditioning, clean and noisy components, or multi-modal information, to explicitly model and disentangle the structure of complex signals for improved learning, generalization, or interpretability.

1. Core Principles and Architectural Variants

In a dual-branch diffusion framework, the stochastic dynamical process underlying generative modeling (e.g., the iterative denoising of a corrupted input) is implemented with two parallel and usually interacting computational paths, each designed to model a specific aspect of the data or the target output. The branches may differ in terms of architectural details, conditioning mechanisms, or modality-specific operations, and commonly include:

Separate Modeling of Signal and Artifact: One branch is dedicated to reconstructing the “clean” component (e.g., the genuine EEG signal), while the other explicitly models the artifact (e.g., EOG, EMG, ECG noise). Each is parameterized by an independent denoising diffusion model, but they share conditioning or modulation layers to facilitate interaction and guidance (Shao et al., 17 Sep 2025).
Branch-Specific Conditioning: Each branch receives distinct conditional inputs (e.g., noise-level embedding, artifact type, or task-related variables) via mechanisms such as Dual-FiLM (dual feature-wise linear modulation) to regulate feature extraction and network response.
Joint Posterior Sampling: The outputs are merged or reconciled via a sampling strategy that ensures joint consistency with the observed data by solving for the most probable decomposition under their respective priors and the observed mixture.

This architectural separation enables explicit disentanglement of correlated or confounding data sources (such as signal and artifact) and allows each branch to be optimized for its particular generative or discriminative role.

2. Mathematical Formulation and Conditional Diffusion Process

Let $y$ denote the observed signal (e.g., contaminated EEG), modeled as a linear or otherwise defined combination of a clean component $x$ and an artifact $x'$ under a mixing parameter $\lambda_{\text{SNR}}$ :

$y = x + \lambda_{\text{SNR}} x'$

Each branch trains a separate denoising diffusion probabilistic model—one for $x$ (the EEG branch), and another for $x'$ (the artifact branch):

The EEG branch trains a denoiser $\epsilon_\theta$ to estimate the conditional posterior $p_\theta(x_0|y)$ using a noise-injection and denoising objective:

$L_{\text{EEG}} = \mathbb{E}_{x_0, y, \sqrt{\bar{\alpha}_t}^*, \varepsilon, z}\left[\lVert \varepsilon - \epsilon_\theta(\sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\varepsilon, y, \sqrt{\bar{\alpha}_t}^*, z)\rVert_1\right]$

where $z$ indexes artifact type and $\sqrt{\bar{\alpha}_t}^*$ is sampled from the noise schedule.

The Artifact branch is trained analogously, with a separate network $\epsilon_\theta'$ for $x'$ .

The dual-branch setup is mediated by a joint posterior over $x,x'$ :

$p(x, x'|y) \propto p(y|x, x')\, p(x)\, p(x')$

where $p(x), p(x')$ are the priors from the respective diffusion models; $p(y|x, x')$ enforces reconstruction fidelity.

During the sampling stage, both branches perform reverse diffusion in parallel. After each step, a data-consistency residual is computed:

$r = y - (x_0 + \lambda_{\text{SNR}} x_0')$

Estimates are updated via a directional consistency constraint:

$\tilde{x}_0 = x_0 + \lambda_{\text{dc}}\, r \qquad \tilde{x}_0' = x_0' + (1-\lambda_{\text{dc}})\, r$

These adjusted latents are used as seeds for the next denoising step, facilitating collaborative optimization under the mixing model.

3. Temporal Modeling and Feature Modulation

The dual-branch architecture is augmented with feature-extraction modules to address both the local temporal dynamics (via convolutional layers) and global dependencies (using Transformer-based blocks). Temporal modeling is critical in domains such as EEG denoising, where both branches must capture:

Short-term waveform fluctuations (for fine noise suppression).
Long-range dependencies (to model artifact patterns or neural event correlations).

Dual-FiLM is employed to modulate feature extraction, leveraging both the noise level and artifact type label. This ensures that each branch adapts to the specific characteristics of the current signal component and artifact class throughout diffusion time.

4. Theoretical and Practical Advantages

Separating branches for clean and artifact modeling, or for other dual paths such as generation and discrimination, confers several advantages:

Disentanglement: Explicit modeling improves the separation of correlated latent features, yielding more interpretable representations—essential in biomedical and scientific domains.
Generalization Across Artifact Types: By including class label conditioning, D4PM handles multiple artifact forms in a unified training regime, avoiding the limitations of single-artifact paradigms (Shao et al., 17 Sep 2025).
Temporal Consistency: Parallel modeling and sample-wise reconciliation ensure that both local and global temporal structures are preserved.
Interpretability and Over-smoothing Reduction: Dual-branch strategies reduce the over-smoothing artifact that can affect conventional deep learning denoising, preserving subtle neural activity essential for accurate scientific analysis.

5. Empirical Results and Benchmarking

Extensive validation on public EEG datasets demonstrates the superiority of dual-branch diffusion frameworks. For example, D4PM achieves:

Correlation coefficients up to 0.991 for EOG artifact removal at challenging SNRs (e.g., −5 dB).
Substantial reductions in RRMSE in both time and spectral domains (e.g., RRMSE $_t$ ≈ 0.124, RRMSE $_s$ ≈ 0.091).
State-of-the-art SNR improvement in reconstructed clean EEG.

Comparisons to baselines such as FCNNs, GAN-based or Transformer-based models, confirm that D4PM's dual-branch structure leads to better separation and higher-fidelity signal recovery while remaining robust when trained on mixed artifact datasets.

6. Extensions and General Applicability

The dual-branch framework is extensible to other domains where separation of mixed or multi-modal signals is critical. Applications include:

Image inpainting (separating mask features from global latent representation) (Ju et al., 11 Mar 2024).
Multi-modal translation (parallel branches for authentic and diffusion-generated visual cues) (Wang et al., 23 Jul 2025).
Time series separation, structured signal restoration, and discriminative tasks where explict decoupling of correlated sources enhances both interpretability and performance.

A plausible implication is that dual-branch architectures, particularly when paired with joint posterior optimization, provide a principled path toward robust generative and discriminative models in scenarios with complex, multi-component data.

7. Limitations and Open Research Questions

Despite demonstrated efficacy, dual-branch diffusion frameworks introduce increased computational cost and require careful tuning of consistency constraints (e.g., $\lambda_{\text{dc}}$ in D4PM). A further open issue is the identification of optimal architectures for distinct signal classes or domain-specific artifacts. Additionally, generalization to higher-dimensional, multi-channel, or multi-modal datasets may require architectural and training innovations yet to be fully explored.

In conclusion, dual-branch diffusion frameworks—exemplified by models such as D4PM (Shao et al., 17 Sep 2025)—represent a significant methodological advance by structurally modeling complex, mixed, or multi-component signals within the diffusion modeling paradigm. By leveraging parallel modeling, conditional feature modulation, and principled joint sampling, these frameworks provide enhanced interpretability, generalization, and denoising fidelity, particularly in challenging artifact removal and signal separation tasks.