TFCDiff: Time-Frequency Diffusion Model

Updated 2 July 2026

Time-Frequency Complementary Diffusion (TFCDiff) is a diffusion-based paradigm that integrates staged noise injection in both time and frequency domains to preserve key signal structures.
It employs reversible spectral decompositions and dual-branch architectures that adaptively manage noise schedules and band-wise embeddings for enhanced denoising.
TFCDiff has demonstrated measurable improvements in applications such as time series forecasting, ECG denoising, and RF signal generation with performance gains in MSE, SSIM, and other fidelity metrics.

Time-Frequency Complementary Diffusion (TFCDiff) refers to a family of diffusion-based paradigms in which the forward and/or reverse stochastic processes incorporate noise or iterative transformations in both the time and frequency domains. This strategy contrasts with classical approaches, which apply isotropic noise directly in the time (or pixel) space, and aims to more effectively preserve, manipulate, and reconstruct structured temporal or spectral patterns. TFCDiff has broad methodological validity, supported by evidence from time series forecasting and imputation, medical signal denoising, RF signal generation, data unlearning, and quantum many-body transport. Core elements include staged or coupled domain-wise noise injection, reversible spectral decomposition, domain-tailored denoising architectures, and frequency-aware scheduling or masking.

1. Theoretical Foundations and Mathematical Formulation

Time-Frequency Complementary Diffusion frameworks generalize denoising diffusion probabilistic models (DDPMs) to jointly or sequentially process temporal and spectral representations. The standard DDPM forward process is: $q(x_t|x_{t-1}) = \mathcal{N}( x_t ; \sqrt{1-\beta_t}x_{t-1},\, \beta_t I ),$ with closed form: $x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),$ where $\bar{\alpha}_t = \prod_{s=1}^{t} (1-\beta_s)$ .

In TFCDiff, the signal $x_0$ is decomposed by a lossless and invertible transform (Fourier, DCT, rDFT, or wavelet):

For spectral-stage decomposition, $x_0 = \sum_{k=1}^K f_0^{(k)}$ , where $f_0^{(k)}$ are orthogonal frequency (or scale) bands.
Staged noise injection is applied per spectral component: $f_t^{(k)} = \sqrt{1-\beta_t} f_{t-1}^{(k)} + \sqrt{d_k \beta_t} \epsilon,$ with $d_k = \mathbb{E}[|f_0^{(k)}|^2]$ capturing the energy of each component (Caldas et al., 29 Jan 2026).

Alternatively, in coupled time-frequency injection, as in HyFAD and RF-Diffusion,

$\begin{aligned} x_k^f &= \sqrt{\alpha_k^f}\mathcal{F}(x_{k-1}^t) + \sqrt{\beta_k^f}\sqrt{1-\lambda}(\Lambda \epsilon_k^f), \ x_k^t &= \sqrt{\alpha_k^t}\mathcal{F}^{-1}(x_k^f) + \sqrt{\beta_k^t}\sqrt{\lambda}\epsilon_k^t, \end{aligned}$

where $\lambda$ balances variance between time and frequency (Gao et al., 3 Jun 2026). In RF contexts, complex-valued signals are blurred in frequency and noised in time.

The reverse (denoising) process mirrors the forward structure (in either sequential or staged order), parameterized by neural networks, with explicit or learned band-wise scheduling and frequency-aware embeddings.

2. Signal Recovery, SNR, and Pattern Preservation

A core insight of TFCDiff is the structured preservation of high signal-to-noise ratio (SNR) for dominant frequencies or long-range temporal patterns. By injecting noise first in low-energy (fine-detail) spectral bands and deferring the corruption of high-energy (trend or periodic) components, the framework maintains the integrity of crucial structural information throughout the diffusion trajectory. The total observed SNR at stage $x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),$ 0 for a staged approach is: $x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),$ 1 Early stages (small $x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),$ 2) “hold back” noise on dominant frequencies, allowing longer maintenance of global structure, improving extrapolation (seasonality/trends) in time series (Caldas et al., 29 Jan 2026) and denoising performance in biomedical signals (Li et al., 20 Nov 2025).

In coarse-to-fine hybrid schedules, such as HyFAD, time-domain steps first recover large-scale low-frequency trends; frequency-domain steps then refine mid- and high-frequency structure, guided by step-dependent, band-wise embeddings (Gao et al., 3 Jun 2026).

3. Model Architectures and Step Embedding Strategies

TFCDiff is typically model-agnostic with respect to the score (denoising) network; it can be layered atop U-Net, S4-based, Transformer, or complex-valued backbones. However, architecture can be enhanced for domain-adaptive processing:

In frequency-domain denoising, inputs are spectral coefficients (e.g., DCT, FFT), often truncated to focus on relevant bands (Li et al., 20 Nov 2025, Caldas et al., 29 Jan 2026).
Hybrid dual-branch architectures perform sequential denoising in time and frequency, each branch equipped with its own step embedding reflecting noise schedules and spectral priorities (Gao et al., 3 Jun 2026).
Frequency-aware step embeddings modulate denoising attention to bands likely to survive noise at each step. Gates $x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),$ 3, schedules $x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),$ 4, and reliability weights $x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),$ 5 are composed and mixed into sinusoidal or custom embeddings, enabling adaptive, band-aware denoising (Gao et al., 3 Jun 2026).

Advanced designs include cross-attention between time and frequency blocks, and spectral attention modules for learning SNR scaling.

4. Applications Across Modalities

TFCDiff methodologies have been validated in diverse domains:

Application	Domain(s)	Transform	Key Outcome	Reference
Time series forecasting	Temporal	Fourier, wavelet	Seasonality/trend preservation, MSE↓19–60% (DiffWave)	(Caldas et al., 29 Jan 2026)
ECG denoising	Biomedical	DCT	Best-in-class robustness/ImSNR, wearable suitability	(Li et al., 20 Nov 2025)
RF signal generation	Complex-valued	FFT (time-freq)	High SSIM/FID, Wi-Fi, FMCW, and CSI estimation ↑	(Chi et al., 2024)
Data unlearning	Images/text	FFT (images)	Targeted, minimal-harm forgetting, faster convergence	(Park et al., 20 Oct 2025)
Quantum transport	Lattice models	n/a (theory)	Unified D from real/momentum/freq domains	(Richter et al., 2018)
Time-series imputation	Temporal	rDFT	SOTA mid/high-freq imputation performance	(Gao et al., 3 Jun 2026)

This breadth demonstrates that time-frequency complementary approaches provide benefits in preserving informative structure, domain-prioritizing denoising, or controlling selective information removal.

5. Algorithms and Training Methodologies

A typical TFCDiff training pipeline involves:

Staged, coupled, or masked noise injection across both domains, often guided by spectral-energy-aware schedules.
Step- or band-conditioned embeddings added to the denoising network inputs, supplying explicit spectral location or variance context.
Loss functions incorporating not only standard noise-matching or ELBO-based terms, but also per-component (spectral band), per-stage consistency losses, and, in some cases, task-specific objectives (e.g., for data unlearning or imputation) (Caldas et al., 29 Jan 2026, Gao et al., 3 Jun 2026, Park et al., 20 Oct 2025).
Sampling/inference involves reverse traversal through the staged or coupled domain spaces, “peeling off” noise or blur according to the designed schedule.

In data unlearning, masking is performed over both time (diffusion steps) and frequency (band ranges), with gradient updates focused on the desired regions of the $x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),$ 6 plane (Park et al., 20 Oct 2025).

6. Empirical Results and Domain-Specific Benchmarks

Empirical evidence consistently demonstrates that TFCDiff strategies outperform vanilla time- or frequency-only diffusion in fidelity, robustness, and specificity:

Time series forecasting (DiffWave, S4, Sashimi backbones): consistent MSE and MAE improvements, strongest in highly periodic data (Caldas et al., 29 Jan 2026).
ECG denoising: superior SSD, MAD, PRD, ImSNR, CosSim on both synthesized and real-world datasets, robust to mixed and strong noise scenarios (Li et al., 20 Nov 2025).
RF signal generation: leading complex-valued SSIM and FID, effective in downstream classification and channel estimation (Chi et al., 2024).
Data unlearning: improved normalized SSCD, higher prompt deletion rates, and better retention of overall model fidelity with targeted, minimal-harm forgetting (Park et al., 20 Oct 2025).
Time-series imputation: significant gains in recovery of both trends and fluctuations under high-missingness, enabled by spectral/temporal scheduling (Gao et al., 3 Jun 2026).

Negligible compute overhead is reported in most cases, as the decomposition and dual-branch processing introduce modest new cost (e.g., <8% extra training time with FFT in time series).

7. Extensions, Limitations, and Future Directions

TFCDiff presents a highly extensible paradigm. Notable recommendations and possible future directions, as suggested by the literature, include:

Adopting alternative invertible transforms (DCT, wavelets, rDFT) depending on the domain or signal class (Li et al., 20 Nov 2025, Gao et al., 3 Jun 2026).
Extending the approach to continuous-time SDE formulations for fully generalizable score-based modeling in both domains (Chi et al., 2024, Gao et al., 3 Jun 2026).
Developing adaptive spectral attention or policy-learning for optimal band/schedule selection, potentially replacing hand-tuned masks or schedules (Park et al., 20 Oct 2025, Caldas et al., 29 Jan 2026).
Exploring hybrid and multi-resolution sampling, as well as cascaded or cross-attention architectures bridging temporal and frequency networks.
Addressing domain-specific challenges such as spectral leakage, transform non-stationarity, and frequency-dependent imputation reliability.

The TFCDiff framework provides a unified, modular blueprint for a new generation of diffusion models where domain interplay and spectral structure are integral to the core process, yielding consistent gains in information preservation, reconstruction quality, and task flexibility across machine learning and physical modeling applications (Caldas et al., 29 Jan 2026, Li et al., 20 Nov 2025, Chi et al., 2024, Park et al., 20 Oct 2025, Gao et al., 3 Jun 2026, Richter et al., 2018).