Channel-wise Conditional Prompting (CCP)
- Channel-wise Conditional Prompting (CCP) is a method integrating channel state into DeepJSCC, enabling dynamic adaptation to varying wireless conditions.
- It employs a Channel State Prompt (CSP) mechanism that modulates intermediate feature maps via element-wise multiplication and Transformer fusion.
- Empirical results show significant improvements in PSNR and memory efficiency, demonstrating CCP's scalability and robustness across diverse channel environments.
Channel-wise Conditional Prompting (CCP), as instantiated by the Channel State Prompt (CSP) mechanism in Prompt JSCC (PJSCC), is a differentiable architectural approach for integrating physical channel state information into deep learning-based joint source-channel coding (DeepJSCC). This design enables wireless semantic communication systems to adapt dynamically to varying signal-to-noise ratios (SNR) and channel distributions without the need for multiple specialized models, addressing fundamental generalization limitations in prior DeepJSCC architectures (Zhang et al., 2024).
1. Architectural Structure of CSP in DeepJSCC
The CSP module is embedded within the PJSCC pipeline between Swin-Transformer stages in both encoder () and decoder () branches. In the encoder, an intermediate feature map from preceding Transformer Feature Extraction (TFE) blocks is passed to the CSP:
- GAP+MLP Feature Extraction: Global Average Pooling (GAP) reduces spatially, and an MLP maps it to a compact vector representing channel-aware features.
- Prompt Retrieval: Based on the current channel state , a prompt tensor is selected from the prompt set .
- Element-wise Modulation: The compact feature and prompt tensor are combined via element-wise multiplication to yield 0.
- Spatial Expansion: A convolution is applied to 1 to restore spatial dimensions.
- Transformer Fusion: This modulated feature is concatenated with the original 2, processed through a Transformer block and a convolution to yield the output feature 3.
Decoder-side CSPs are structurally identical, operating on feature maps within 4 and using dedicated prompt tensors 5. CSP modules are inserted between each pair of Transformer blocks along both encoding and decoding pathways (Zhang et al., 2024).
2. Mathematical Formulation of Prompt Generation and Modulation
The CSP employs a discrete set of learnable prompt tensors 6, indexed by all combinations of SNR levels and channel distributions. For 7 CSP stages and 8 channel states:
9
For input feature 0, channel-aware projection:
1
Retrieval and fusion use the matching prompt tensor 2:
3
4
5
Only the prompt from 6 corresponding to the sampled 7 is used at each forward pass.
3. Feature and Prompt Fusion Strategy
CSP’s fusion scheme consists of three stages:
- Channel-wise Modulation: Element-wise multiplication of the compacted feature vector and prompt tensor aligns semantic information with channel state.
- Spatial Merging: A small convolutional operator reconstructs the modulated feature into full spatial dimensions for the next network stage.
- Transformer Fusion: Concatenation with the original feature map is followed by a Transformer block and convolution for joint feature aggregation.
This design provides a fully differentiable, unified mechanism for channel-adaptive feature modulation, allowing the backbone network to learn to prioritize information transmission based on granular channel conditions.
4. Training Objective and Adaptability
The PJSCC framework with CSP modules is trained end-to-end using mean-squared error (MSE) loss between the input image 8 and its reconstruction 9:
0
where 1 and 2 denotes the received signal realizations. Model parameters 3, 4, and prompts 5, are optimized jointly. Sampling from across SNR ∈ [1, 13] dB and both AWGN and Rayleigh channels during training ensures generalization to a diverse range of conditions without subsequent fine-tuning or separate model deployments (Zhang et al., 2024).
5. Empirical Performance
Comprehensive evaluations demonstrate that the inclusion of CSP modules in PJSCC delivers superior image reconstruction quality over ablated and baseline systems. On CIFAR-10 under AWGN, CBR = 1/3, the reported PSNR (dB) at various SNR points is:
| Method | 1 dB | 4 dB | 7 dB | 10 dB | 13 dB |
|---|---|---|---|---|---|
| w/ CSP (PJSCC) | 30.30 | 33.28 | 35.92 | 38.17 | 39.97 |
| w/ AF | 29.64 | 32.64 | 34.84 | 36.83 | 38.58 |
| w/o CSP | 29.56 | 32.50 | 34.68 | 36.65 | 38.40 |
Consistent improvements are observed on LPIPS metrics as well. Over full PSNR vs SNR curves and multiple datasets (CIFAR-10, Kodak, CLIC), PJSCC with CSP outperforms DeepJSCC, ADJSCC, WITT, and BPG+LDPC across both AWGN and Rayleigh channels. These gains indicate the efficacy of channel-aware prompting in enabling robust semantic transmission without a proliferation of models (Zhang et al., 2024).
6. Memory Efficiency and Scalability
PJSCC with CSP modules achieves competitive inference times and computational cost. For high-resolution images, it requires 52.3 ms and 39.5 G FLOPs per sample; for low-resolution, 62.4 µs and 767 M FLOPs. This is comparable to WITT and an order of magnitude more efficient than ADJSCC and DeepJSCC (610× FLOPs).
In terms of model storage, PJSCC-U can integrate both AWGN and Rayleigh channels in a single model, resulting in significantly reduced memory requirements compared to baselines that require separate models for each channel type:
| Model | One Channel | Two Channels |
|---|---|---|
| PJSCC-U | 178 MB | 216 MB |
| WITT | 120 MB | 240 MB |
| ADJSCC | 131 MB | 262 MB |
| DeepJSCC | 1469 MB | 2938 MB |
This storage compactness is attributed to the modularity and sharing of learnable prompts, eliminating the need for redundant parameterization per channel environment.
7. Ablation Studies and Robustness
Ablations establish the critical value of CSP prompts compared to both simple SNR concatenation (AF) and no prompting. Across all SNRs, using CSP increases PSNR by ≈1 dB over AF and ≈1.4 dB over no-prompt baselines. Each prompt tensor 7 is sized to match the spatial and channel dimensions of the corresponding feature map stage (e.g., 8 in high-resolution blocks). Robustness tests confirm that the model generalizes seamlessly to previously unseen intermediate SNR values and both supported channel types, maintaining high reconstruction quality without additional fine-tuning or retraining (Zhang et al., 2024).
CCP, as operationalized via CSP in PJSCC, constitutes a lightweight, fully differentiable prompt-based conditioning mechanism. It incorporates structured channel priors and achieves dynamic adaptation, memory efficiency, and high-fidelity semantic transmission across a broad spectrum of wireless conditions, all within a single unified DeepJSCC model (Zhang et al., 2024).