Papers
Topics
Authors
Recent
Search
2000 character limit reached

TFCDiff: Time-Frequency Diffusion Model

Updated 2 July 2026
  • Time-Frequency Complementary Diffusion (TFCDiff) is a diffusion-based paradigm that integrates staged noise injection in both time and frequency domains to preserve key signal structures.
  • It employs reversible spectral decompositions and dual-branch architectures that adaptively manage noise schedules and band-wise embeddings for enhanced denoising.
  • TFCDiff has demonstrated measurable improvements in applications such as time series forecasting, ECG denoising, and RF signal generation with performance gains in MSE, SSIM, and other fidelity metrics.

Time-Frequency Complementary Diffusion (TFCDiff) refers to a family of diffusion-based paradigms in which the forward and/or reverse stochastic processes incorporate noise or iterative transformations in both the time and frequency domains. This strategy contrasts with classical approaches, which apply isotropic noise directly in the time (or pixel) space, and aims to more effectively preserve, manipulate, and reconstruct structured temporal or spectral patterns. TFCDiff has broad methodological validity, supported by evidence from time series forecasting and imputation, medical signal denoising, RF signal generation, data unlearning, and quantum many-body transport. Core elements include staged or coupled domain-wise noise injection, reversible spectral decomposition, domain-tailored denoising architectures, and frequency-aware scheduling or masking.

1. Theoretical Foundations and Mathematical Formulation

Time-Frequency Complementary Diffusion frameworks generalize denoising diffusion probabilistic models (DDPMs) to jointly or sequentially process temporal and spectral representations. The standard DDPM forward process is: q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t|x_{t-1}) = \mathcal{N}( x_t ; \sqrt{1-\beta_t}x_{t-1},\, \beta_t I ), with closed form: xt=αˉtx0+1αˉtϵ,ϵN(0,I),x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I), where αˉt=s=1t(1βs)\bar{\alpha}_t = \prod_{s=1}^{t} (1-\beta_s).

In TFCDiff, the signal x0x_0 is decomposed by a lossless and invertible transform (Fourier, DCT, rDFT, or wavelet):

  • For spectral-stage decomposition, x0=k=1Kf0(k)x_0 = \sum_{k=1}^K f_0^{(k)}, where f0(k)f_0^{(k)} are orthogonal frequency (or scale) bands.
  • Staged noise injection is applied per spectral component: ft(k)=1βtft1(k)+dkβtϵ,f_t^{(k)} = \sqrt{1-\beta_t} f_{t-1}^{(k)} + \sqrt{d_k \beta_t} \epsilon, with dk=E[f0(k)2]d_k = \mathbb{E}[|f_0^{(k)}|^2] capturing the energy of each component (Caldas et al., 29 Jan 2026).

Alternatively, in coupled time-frequency injection, as in HyFAD and RF-Diffusion,

xkf=αkfF(xk1t)+βkf1λ(Λϵkf), xkt=αktF1(xkf)+βktλϵkt,\begin{aligned} x_k^f &= \sqrt{\alpha_k^f}\mathcal{F}(x_{k-1}^t) + \sqrt{\beta_k^f}\sqrt{1-\lambda}(\Lambda \epsilon_k^f), \ x_k^t &= \sqrt{\alpha_k^t}\mathcal{F}^{-1}(x_k^f) + \sqrt{\beta_k^t}\sqrt{\lambda}\epsilon_k^t, \end{aligned}

where λ\lambda balances variance between time and frequency (Gao et al., 3 Jun 2026). In RF contexts, complex-valued signals are blurred in frequency and noised in time.

The reverse (denoising) process mirrors the forward structure (in either sequential or staged order), parameterized by neural networks, with explicit or learned band-wise scheduling and frequency-aware embeddings.

2. Signal Recovery, SNR, and Pattern Preservation

A core insight of TFCDiff is the structured preservation of high signal-to-noise ratio (SNR) for dominant frequencies or long-range temporal patterns. By injecting noise first in low-energy (fine-detail) spectral bands and deferring the corruption of high-energy (trend or periodic) components, the framework maintains the integrity of crucial structural information throughout the diffusion trajectory. The total observed SNR at stage xt=αˉtx0+1αˉtϵ,ϵN(0,I),x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),0 for a staged approach is: xt=αˉtx0+1αˉtϵ,ϵN(0,I),x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),1 Early stages (small xt=αˉtx0+1αˉtϵ,ϵN(0,I),x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),2) “hold back” noise on dominant frequencies, allowing longer maintenance of global structure, improving extrapolation (seasonality/trends) in time series (Caldas et al., 29 Jan 2026) and denoising performance in biomedical signals (Li et al., 20 Nov 2025).

In coarse-to-fine hybrid schedules, such as HyFAD, time-domain steps first recover large-scale low-frequency trends; frequency-domain steps then refine mid- and high-frequency structure, guided by step-dependent, band-wise embeddings (Gao et al., 3 Jun 2026).

3. Model Architectures and Step Embedding Strategies

TFCDiff is typically model-agnostic with respect to the score (denoising) network; it can be layered atop U-Net, S4-based, Transformer, or complex-valued backbones. However, architecture can be enhanced for domain-adaptive processing:

  • In frequency-domain denoising, inputs are spectral coefficients (e.g., DCT, FFT), often truncated to focus on relevant bands (Li et al., 20 Nov 2025, Caldas et al., 29 Jan 2026).
  • Hybrid dual-branch architectures perform sequential denoising in time and frequency, each branch equipped with its own step embedding reflecting noise schedules and spectral priorities (Gao et al., 3 Jun 2026).
  • Frequency-aware step embeddings modulate denoising attention to bands likely to survive noise at each step. Gates xt=αˉtx0+1αˉtϵ,ϵN(0,I),x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),3, schedules xt=αˉtx0+1αˉtϵ,ϵN(0,I),x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),4, and reliability weights xt=αˉtx0+1αˉtϵ,ϵN(0,I),x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),5 are composed and mixed into sinusoidal or custom embeddings, enabling adaptive, band-aware denoising (Gao et al., 3 Jun 2026).

Advanced designs include cross-attention between time and frequency blocks, and spectral attention modules for learning SNR scaling.

4. Applications Across Modalities

TFCDiff methodologies have been validated in diverse domains:

Application Domain(s) Transform Key Outcome Reference
Time series forecasting Temporal Fourier, wavelet Seasonality/trend preservation, MSE↓19–60% (DiffWave) (Caldas et al., 29 Jan 2026)
ECG denoising Biomedical DCT Best-in-class robustness/ImSNR, wearable suitability (Li et al., 20 Nov 2025)
RF signal generation Complex-valued FFT (time-freq) High SSIM/FID, Wi-Fi, FMCW, and CSI estimation ↑ (Chi et al., 2024)
Data unlearning Images/text FFT (images) Targeted, minimal-harm forgetting, faster convergence (Park et al., 20 Oct 2025)
Quantum transport Lattice models n/a (theory) Unified D from real/momentum/freq domains (Richter et al., 2018)
Time-series imputation Temporal rDFT SOTA mid/high-freq imputation performance (Gao et al., 3 Jun 2026)

This breadth demonstrates that time-frequency complementary approaches provide benefits in preserving informative structure, domain-prioritizing denoising, or controlling selective information removal.

5. Algorithms and Training Methodologies

A typical TFCDiff training pipeline involves:

  • Staged, coupled, or masked noise injection across both domains, often guided by spectral-energy-aware schedules.
  • Step- or band-conditioned embeddings added to the denoising network inputs, supplying explicit spectral location or variance context.
  • Loss functions incorporating not only standard noise-matching or ELBO-based terms, but also per-component (spectral band), per-stage consistency losses, and, in some cases, task-specific objectives (e.g., for data unlearning or imputation) (Caldas et al., 29 Jan 2026, Gao et al., 3 Jun 2026, Park et al., 20 Oct 2025).
  • Sampling/inference involves reverse traversal through the staged or coupled domain spaces, “peeling off” noise or blur according to the designed schedule.

In data unlearning, masking is performed over both time (diffusion steps) and frequency (band ranges), with gradient updates focused on the desired regions of the xt=αˉtx0+1αˉtϵ,ϵN(0,I),x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, \quad \epsilon \sim \mathcal{N}(0, I),6 plane (Park et al., 20 Oct 2025).

6. Empirical Results and Domain-Specific Benchmarks

Empirical evidence consistently demonstrates that TFCDiff strategies outperform vanilla time- or frequency-only diffusion in fidelity, robustness, and specificity:

  • Time series forecasting (DiffWave, S4, Sashimi backbones): consistent MSE and MAE improvements, strongest in highly periodic data (Caldas et al., 29 Jan 2026).
  • ECG denoising: superior SSD, MAD, PRD, ImSNR, CosSim on both synthesized and real-world datasets, robust to mixed and strong noise scenarios (Li et al., 20 Nov 2025).
  • RF signal generation: leading complex-valued SSIM and FID, effective in downstream classification and channel estimation (Chi et al., 2024).
  • Data unlearning: improved normalized SSCD, higher prompt deletion rates, and better retention of overall model fidelity with targeted, minimal-harm forgetting (Park et al., 20 Oct 2025).
  • Time-series imputation: significant gains in recovery of both trends and fluctuations under high-missingness, enabled by spectral/temporal scheduling (Gao et al., 3 Jun 2026).

Negligible compute overhead is reported in most cases, as the decomposition and dual-branch processing introduce modest new cost (e.g., <8% extra training time with FFT in time series).

7. Extensions, Limitations, and Future Directions

TFCDiff presents a highly extensible paradigm. Notable recommendations and possible future directions, as suggested by the literature, include:

  • Adopting alternative invertible transforms (DCT, wavelets, rDFT) depending on the domain or signal class (Li et al., 20 Nov 2025, Gao et al., 3 Jun 2026).
  • Extending the approach to continuous-time SDE formulations for fully generalizable score-based modeling in both domains (Chi et al., 2024, Gao et al., 3 Jun 2026).
  • Developing adaptive spectral attention or policy-learning for optimal band/schedule selection, potentially replacing hand-tuned masks or schedules (Park et al., 20 Oct 2025, Caldas et al., 29 Jan 2026).
  • Exploring hybrid and multi-resolution sampling, as well as cascaded or cross-attention architectures bridging temporal and frequency networks.
  • Addressing domain-specific challenges such as spectral leakage, transform non-stationarity, and frequency-dependent imputation reliability.

The TFCDiff framework provides a unified, modular blueprint for a new generation of diffusion models where domain interplay and spectral structure are integral to the core process, yielding consistent gains in information preservation, reconstruction quality, and task flexibility across machine learning and physical modeling applications (Caldas et al., 29 Jan 2026, Li et al., 20 Nov 2025, Chi et al., 2024, Park et al., 20 Oct 2025, Gao et al., 3 Jun 2026, Richter et al., 2018).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Time-Frequency Complementary Diffusion (TFCDiff).