Continuous DDIM Channel Model

Updated 25 November 2025

The paper introduces a continuous DDIM-based channel model that unifies diffusion processes with delay–Doppler representations to generate realistic, user-specific wireless channels.
It employs deterministic reverse DDIM sampling with 3D positional conditioning and a U-Net architecture to preserve key physical properties and spatial statistics.
Empirical results show significant improvements in metrics such as NMSE and beam alignment, achieving performance within 1 dB of full real-data benchmarks.

A Continuous DDIM-based channel refers to a model of wireless propagation in which high-dimensional, user-dependent wireless channels are generated or represented through the continuous (and often conditional) formulation of Denoising Diffusion Implicit Models (DDIM). This integrates the mathematical rigor of continuous-time diffusion processes from stochastic differential equations with the structure of delay–Doppler (DD) channel models and leverages powerful neural generative models to efficiently synthesize channel realizations faithful to site-specific physical characteristics and measurements (Lee et al., 5 Sep 2024, Tong, 4 Oct 2025).

1. Mathematical Foundations: Diffusion Models and Continuous DD Channel Representation

At the core, a continuous DDIM-based channel model blends two advanced frameworks: the mathematical formalization of physical wireless channels in the delay–Doppler domain and the continuous-time generative processes of diffusion models.

The delay–Doppler (DD) modulation representation for a physical, linear time-varying channel is

$y(t) = \int_{\tau}\int_{\nu} S(\tau, \nu) x(t - \tau) e^{j2\pi \nu t} d\nu d\tau$

where $S(\tau,\nu)$ is the DD spreading function encoding multipath delay ( $\tau$ ) and Doppler shift ( $\nu$ ). The observable channel is effectively band-limited in $(\tau, \nu)$ space due to practical transmit and receive time/frequency windows $g_{\mathrm{Tx}}, g_{\mathrm{Rx}}$ , which, after convolution, restrict the effective support of $S_{\rm eff}(\tau,\nu)$ .

Sampling theory then admits an on-grid equivalent (Universal On-Grid DDIM Model) under mild assumptions: $y(t) = \sum_{l,m} H[l,m]\, x(t-l\Delta\tau) e^{j2\pi m \Delta\nu t}$ with grid spacings $\Delta\tau = 1/B$ and $\Delta\nu = 1/T$ , where $B$ is bandwidth, $T$ is observation duration, and the on-grid taps $H[l,m]$ correspond to sampled and windowed versions of $S(\tau,\nu)$ (Tong, 4 Oct 2025).

On the generative side, continuous-time diffusion models provide a mathematically principled framework to synthesize samples from high-dimensional distributions. In the limit, the forward process is defined by an Itô SDE

$\mathrm{d}x = -\tfrac{1}{2}\beta(t)x\, \mathrm{d}t + \sqrt{\beta(t)}\,\mathrm{d}W_t$

where $x$ encodes the (real-valued) channel vector. The reverse DDIM construction allows for deterministic sampling using learned score approximators (the neural network estimate of the noise) (Lee et al., 5 Sep 2024).

2. Conditional Generative Mechanisms in the Continuous DDIM Channel Model

To synthesize physically meaningful, user-specific channel samples, the generative process is conditioned on user position. The input conditioning mechanism maps the 3D position vector $u\in\mathbb{R}^3$ (e.g., coordinates in a deployment setting) into an embedding vector via an MLP or sinusoidal encoder. This embedding is fused with the temporal embedding at each step in the U-Net backbone, typically through FiLM-style (feature-wise linear modulation) affine modulations or concatenation, ensuring that generative outputs $x_0$ are sampled from the conditional law $p(H_0|u)$ .

The training loss is a classical DDIM mean-squared error objective: $\mathcal{L}(\theta) = \mathbb{E}_{x_0,u,t,\epsilon} \left\lVert \epsilon - \epsilon_\theta(\sqrt{\bar{\alpha}_t} x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, t, u)\right\rVert^2$ where, for each ground truth channel sample $x_0$ , user position $u$ , and diffusive time $t$ , the objective pushes the network to accurately denoise the noisy channel conditioned on location (Lee et al., 5 Sep 2024).

3. Deterministic Sampling and Architecture: Algorithmic Implementation

The synthesis of a high-dimensional user-specific channel $H$ proceeds by initializing a Gaussian noise vector and iteratively applying deterministic reverse DDIM steps. At each iteration, given the current state $x_t$ , time $t$ , and position embedding $u$ , the model predicts the noise and reconstructs the clean sample

$\hat{x}_0 = \frac{x_t - \sqrt{1-\bar{\alpha}_t} \epsilon_\theta(x_t, t, u)}{\sqrt{\bar{\alpha}_t}}$

followed by the update

$x_{t-1} = \sqrt{\bar{\alpha}_{t-1}}\hat{x}_0 + \sqrt{1-\bar{\alpha}_{t-1}}\epsilon_\theta(x_t, t, u)$

This process is implicit and non-stochastic (no additional sampled noise), resulting in fast and reproducible channel generation once the model is trained (Lee et al., 5 Sep 2024).

The U-Net architecture is critical: it processes the $(2 \times N_r \times N_t)$ real-valued beamspace channel tensors, incorporates group normalization and SiLU activations, and leverages skip connections for multiresolution feature propagation.

4. Physical Fidelity and Preservation of Wireless Channel Properties

Key physical properties—large-scale fading, path-loss, angle-of-arrival statistics, spatial correlation due to multipath clustering—are preserved by adopting a beamspace representation. Specifically, unitary DFT transforms ( $A_r, A_t$ ) are applied such that the channel $H_v = A_r^H H A_t$ is sparse and peaked along physical angles, facilitating learning of consistent, realistic spatial statistics.

Smooth conditioning on user location ensures that synthesized channels interpolate accurately between physically meaningful states as user position varies. The implicit smoothness induced by noise-annealed training further constrains generated samples to remain on the physically plausible channel manifold. In practice, generated channels maintain second-order statistics (covariance, angular spread) and reproduce large-scale fading and beam spread within 1 dB of ground-truth measurements (Lee et al., 5 Sep 2024).

5. Empirical Validation: Data Augmentation and Downstream Utility

Evaluation in simulation environments—QuaDRiGa urban macro LOS (28 GHz, $N_t=32$ , $N_r=4$ ) and DeepMIMO 28 GHz with outdoor blockage—demonstrates substantial gains from cDDIM-based channel synthesis over competing approaches (Gaussian-noise augmentation and conditional GANs). Performance metrics include:

Peak-index match probability ( $\Pr(D=0)$ ): cDDIM achieves approximately 0.6, while cGANs fall below random guess levels (0.024), the closest-UE lookup baseline attains 0.55.
Channel compression (CRNet, NMSE): With 1,000 true training samples, direct training yields NMSE ~–10 dB, while augmentation to 90,000 samples via cDDIM improves NMSE to ~–17 dB (within 1 dB of full real-data performance). Other augmentations are 1–2 dB worse.
Beam alignment (BAE, average post-selection SNR): Using mostly cDDIM-synthesized samples, BAE achieves SNR within 1 dB of full-data and outperforms both traditional and rival neural augmentation baselines. cGAN and noise-augmented methods fail to match performance and can yield negative SNR (worse than noise floor).

This shows that cDDIM-based channels can enable learning in regimes where measurement scarcity would otherwise limit performance, drastically enhancing data-driven physical/MAC-layer wireless design (Lee et al., 5 Sep 2024).

6. Integration with Continuous-Time Delay–Doppler Models and Practical Implications

Formally, cDDIM-based channel synthesis naturally complements the universal on-grid DDIM (delay–Doppler impulse modulation) framework, where continuous physical propagation effects are exactly—under finite-support windowing—represented by a discrete (on-grid) tap array $H[l,m]$ with grid spacings $\Delta\tau = 1/B$ , $\Delta\nu = 1/T$ (Tong, 4 Oct 2025). Each physical (generally off-grid) multipath component contributes to multiple taps through deterministic window-induced leakage.

Table: Connections between On-Grid DDIM and cDDIM-based Generative Models

Attribute	On-Grid DDIM Channel (Tong, 4 Oct 2025)	cDDIM Channel Synthesis (Lee et al., 5 Sep 2024)
Representation	$H[l,m]$ taps (delay, Doppler grid)	Channel tensor in beamspace (conditional)
Physical Basis	Derived from S(τ, ν), scattering	Learned manifold via SDE–score matching
Data Constraints	Measurement, window-induced limits	Overcome scarcity via generative model
Conditionality	Typically site-wide or scenario-wide	User-specific via explicit location input

This duality enables physically principled, data-driven wireless models that are compatible with advanced digital signal processing frameworks (OTFS, ODDM) and neural network-based wireless system design.

7. Scalability, Sample Efficiency, and Future Research Directions

cDDIM-based channel modeling exhibits several advantageous properties:

Scalability: By working in the beamspace domain and leveraging sparsity, the scheme scales to hundreds of antennas with only linear growth in network size or training resources, without change in recipe.
Sample Efficiency: Empirical and analytic results indicate that $1,000$–$10,000$ real measurements suffice for accurate modeling, and synthetic augmentation with cDDIM recovers >90% of downstream task performance even with 1% of the full dataset.
Inference Speed: Standard DDIM requires 256 sampling steps; distillation methods reduce this to 32 with modest accuracy loss, but further reduction degrades fidelity.
Physical Plausibility and Modeling Residuals: The delay–Doppler, on-grid equivalence framework emphasizes the importance of window selection (e.g., use of smooth windows to minimize leakage), careful tap-correlation modeling, and the risks of under-representing off-grid phenomena.

A plausible implication is that further research could enhance parameter efficiency or reduce sampling times through improved network architectures and consistency-based distillation, as well as integrate learned generative architectures with principled on-grid DDIM parameterizations for interpretable, efficient joint modeling and inference (Lee et al., 5 Sep 2024, Tong, 4 Oct 2025).