Relay Diffusion Models: Theory and Applications

Updated 20 March 2026

Relay Diffusion Models are a family of techniques that relay the diffusion process across multiple stages, ensuring smooth information transfer without restarting from independent noise.
In image synthesis, RDMs achieve high fidelity by integrating block noise injection and patch-wise blurring to seamlessly transition from low- to high-resolution stages.
In molecular communications, RDM employs sense-and-forward and decode-and-forward strategies to improve channel capacity and reduce error rates by relaying molecular signals.

The Relay Diffusion Model (RDM) refers to a family of methodologies in which a diffusion process is relayed across multiple stages or nodes, enabling seamless propagation or communication with improved efficiency, fidelity, or biological plausibility. The term encompasses applications in both deep generative modeling (specifically, multi-resolution image synthesis) and molecular information transfer. RDM frameworks are characterized by their use of relayed or conditioned diffusion processes, which transfer and transform states—such as images, noise, or molecule concentrations—across resolutions or communication links without restarting from independent noise or initial conditions.

1. Theoretical Underpinnings and Motivation

High-resolution generative modeling with diffusion models has historically been hindered by signal-to-noise ratio (SNR) mismatches at increased resolutions. The same noise power, when applied to higher resolution images, results in disproportionately higher SNR in the lowest DCT frequency bands, distorting the synthesis and sampling process and degrading output fidelity. Classical cascaded diffusion models (CDM) address this by generating a low-resolution image, then performing super-resolution as a separate, explicitly conditioned process; however, these must either inject artificial noise back into the system or forego continuity in the diffusion trajectory.

Relay Diffusion Models as formulated in image synthesis (e.g., "Relay Diffusion: Unifying diffusion process across resolutions for image synthesis" (Teng et al., 2023), "CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion" (Zheng et al., 2024)) address this by constructing a physically and distributionally plausible handoff between low- and high-resolution stages. Key innovations include block noise insertions to maintain statistical equivalency across upsampling and patch-wise blurring or linearly scheduled relay blends of low- and high-resolution latent states. This enables the diffusion chain to continue seamlessly, avoiding the artifacts and inefficiencies inherent to standard approaches.

In molecular communications, RDM arises in the context of physically relaying signals through intermediate nodes (e.g., synthetic bacteria acting as relay nodes) to counteract distance-dependent signal attenuation, enabling reliable communication across larger spatial scales. This applies Fickian Gaussian diffusion, stochastic receptor activation, and multi-type molecular relaying to improve effective communication bandwidth and reduce error rates ("Relaying in Diffusion-Based Molecular Communication" (Einolghozati et al., 2014)).

2. Algorithmic Components and Mathematical Formulation

Image Synthesis: Relay Diffusion Generative Pipelines

Modern RDMs consist of at least two main algorithmic stages:

Base diffusion: A standard UNet-based latent diffusion model generates a low-resolution image $z_0^{\text{low}}$ from a prompt or random seed, using a forward noising process

$q(z_t | z_{t-1}) = \mathcal{N}(z_t; \sqrt{1-\beta_t} z_{t-1}, \beta_t I),$

and a learned reverse denoising process.

Relay mechanism: The low-resolution output is decoded to pixels, upsampled (via bilinear or nearest-neighbor interpolation), and re-encoded (if in latent space), yielding $z^L_0$ $z_{0}^{L}$ . Gaussian (or block) noise is added to replicate the statistical properties of scale transitions, forming the starting point for the second diffusion process at higher resolution:
- For CogView3: A blurring schedule $F(z_0, t) = (1-t/T)z_0 + (t/T)z^L_0$ linearly interpolates between the true high-res target and the noised, upsampled latent, with
$q(z_t | z_0) = \mathcal{N}(z_t; F(z_0, t), \sigma_t^2 I).$ - For RDM (in image space): Block-correlated Gaussian noise, mimicking upsampled noise, is added, and patch-wise DCT blurring is applied locally within each spatial block.

Both settings use L2 denoising objectives:

$\mathcal{L} = \mathbb{E}_{z_0, t, \epsilon}\;\| D(z_t, t, c) - z_0 \|_2^2\;,$

with conditioning on text prompts or class labels as appropriate.

A second-order stochastic sampler (Heun/Karras method) is used for efficient inference in high-resolution space, accommodating both patch-wise blur and block noise.

Molecular Communications: Relayed Information Transfer

RDM, for information-theoretic molecular relay, models molecular concentration $C(r, t)$ using Fick's second law:

$\frac{\partial C(r, t)}{\partial t} = D \nabla^2 C(r, t) - \lambda C(r, t) + S(r, t),$

where relay nodes can adopt sense-and-forward (immediate concentration relaying), multi-type relaying (orthogonal molecular species), or decode-and-forward (symbol-level) relaying strategies. Channel capacity and symbol error rate can be characterized as a function of relay placement, molecule population, and binding statistics at the receiver.

3. Relay Mechanism Details and Information Transfer

In both synthetic image and molecular communication settings, the relay mechanism ensures smooth transfer of information-carrying states (images, noise, molecular concentration) across either spatial, scale, or population bottlenecks.

Image Synthesis

Upsampling and noise injection: After the base stage, the decoded image is upsampled to the higher target resolution and re-encoded if necessary. Noise is introduced with specific blockwise spatial correlations so that the high-resolution input approximates an upscaled noisy low-resolution image, preserving the diffusion trajectory's statistical character.
Blurring and diffusion continuation: Patch-wise blurring exploits local DCT-domain smoothing, allowing the second-stage diffusion process to refine details that match the global structure while focusing increased modeling power on high-resolution frequency bands (Teng et al., 2023).
Schedule adaptation: The noise schedule for the high-res relay stage is truncated or adjusted to match the actual frequency domain SNR, correcting the principal weakness of prior single-stage noise approaches.

Molecular Communications

Sense-and-forward: The relay detects the incoming concentration and emits molecules at a rate proportional to its sensing, thus directly transferring the molecular signal without decoding (Einolghozati et al., 2014).
Multi-type relaying: The relay emits molecules of a second type, independently detectable by the receiver, effectively doubling the observability and increasing capacity.
Decode-and-forward: The relay decodes the received concentration into a symbol and transmits an encoded signal for that symbol, which the receiver combines with the original transmission to lower error rates, especially in low-SNR regimes.

4. Network Architectures, Loss Functions, and Conditioning

RDM frameworks in deep generative models employ UNet-based denoisers, with attention-augmented conditioning layers for textual or class inputs. A progressive training regime is often used—in CogView3, the same UNet backbone is gradually adapted for increasing resolutions (256×256, 512×512, 1024×1024) (Zheng et al., 2024). Both stages rely on simple mean-squared-error denoising loss, and classifier-free guidance strategies for improved sampling.

Hyperparameter choices are crucial; RDM settings on block noise strength ( $\alpha$ ), block size ( $s$ ), blur amplitude, and noise schedule truncation directly impact convergence and sample fidelity (Teng et al., 2023). All stages are amenable to mixed-precision acceleration.

In biological RDM, mathematical models govern the design of bacterial population sizes, relay node positioning, molecule type choices, and the resultant statistical capacity and detection rates, with performance mediated by the regime of receptor activation and noise propagation (Einolghozati et al., 2014).

5. Computational Efficiency and Quantitative Evaluation

Relay Diffusion Models demonstrate substantial computational advantages over traditional single-stage or cascaded pipelines:

In CogView3, base and relay super-resolution stages operate at O( $N^2$ ) and O( $4N^2$ ) complexity, respectively. Allocating e.g., 50 steps at 512×512 and 10 steps at 1024×1024 yields an 2× speedup compared to 50 steps at high-res, which would incur O( $200N^2$ ) cost. Distilled variants support further reduction down to O( $8N^2$ ), a 25-fold efficiency gain (Zheng et al., 2024).
Quantitatively, RDM achieves state-of-the-art FID (3.15 on CelebA-HQ 256², 5.27 on ImageNet 256²), surpassing LDM, ADM, DiT, and StyleGAN-XL in direct comparisons. Block noise and adaptive sampling improve both sample diversity (precision/recall) and computational efficiency (quality at low sampling step counts) (Teng et al., 2023).

In molecular RDM applications, sense-and-forward relaying approximately doubles channel capacity at low concentration regimes. Multi-type relaying provides further improvements, essentially equivalent to increasing the effective bacterial population, while decode-and-forward dramatically reduces error rates in low-SNR regimes (Einolghozati et al., 2014).

6. Generalization, Limitations, and Extensions

Relay Diffusion Models generalize naturally to multiple scales or relay stages, supporting operations across arbitrary integer resolution ratios or relay hops. In wavelet, hybrid, or block-structured domains, the framework can be extended by tuning the blurring operation, upsampling scheme, and injected noise statistics (Teng et al., 2023). In molecular contexts, chaining more than one relay node extends communication range but introduces trade-offs in relay noise accumulation, complexity, and physical constraints (e.g., population limits on bacteria).

Notable limitations include:

The need to tune blurring strength, block size, and noise schedule truncation for each new data domain or upsampling factor.
Restrictions of patchwise blurring to integer upsampling ratios; fractional or learned upsampling would require new theoretical developments.
For very high-resolution image modeling (>1K pixels), memory limitations may necessitate further subdivision into more than two relay stages.

Potential extensions include adaptive relay scheduling, integration into multimodal text-to-image or video pipelines, and unified SDE frameworks that combine Gaussian, block, and blur noise. In biological systems, further work could refine relay placement optimization and molecule type selection, or explore the role of relay networks in complex multicellular communication.

7. Comparative Summary and Impact

Relay Diffusion Models substantively advance both generative modeling and molecular communications by mediating the transition between disparate scales or communication nodes via statistically principled, relay-based mechanisms. Across applications:

Application Domain	RDM mechanism	Key Benefits
Image Synthesis	Block noise + patchwise blurring, relay across UNet stages (Teng et al., 2023, Zheng et al., 2024)	High sample quality, efficient sampling, seamless multi-resolution diffusion
Molecular Communication	Molecular relay node(s) (Einolghozati et al., 2014)	Increased range, channel capacity, and reliability

Empirical results in generative modeling confirm state-of-the-art sample metrics and substantial reductions in compute requirements. In synthetic biological systems, RDM enables communication over distances otherwise limited by rapid signal decay and biological noise. The model's versatility and principled construction suggest it will remain a core component of multi-stage diffusion frameworks across domains.

Markdown Report Issue Upgrade to Chat

References (3)

Relay Diffusion: Unifying diffusion process across resolutions for image synthesis (2023)

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion (2024)

Relaying in Diffusion-Based Molecular Communication (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relay Diffusion Model (RDM).