Papers
Topics
Authors
Recent
2000 character limit reached

DeepJSCC: End-to-End Neural Communications

Updated 12 January 2026
  • DeepJSCC is an end-to-end neural network system that jointly optimizes source and channel coding for robust communications under real-world constraints.
  • It employs convolutional autoencoders with adaptive mechanisms like attention and feature learning to maintain performance amidst noise and fading.
  • Performance evaluations show significant PSNR gains and graceful degradation compared to traditional separation methods, ensuring low latency and semantic fidelity.

Deep Joint Source-Channel Coding (DeepJSCC) is a paradigm in wireless communications that replaces the classic separated approach—modulation, source coding, channel coding—with an end-to-end neural network architecture jointly optimized over real-world channel conditions and performance criteria. DeepJSCC systems directly map source vectors (such as images) to channel symbols and back, learning feature representations that are inherently robust to noise, fading, and bandwidth constraints. This approach yields graceful performance degradation, semantic adaptivity, and low-latency operation, representing a major advance over separation-based pipelines particularly in power- or bandwidth-limited scenarios (Xu et al., 2022).

1. Mathematical Formulation and System Model

The canonical DeepJSCC system consists of a source vector sRns\in\mathbb{R}^n (e.g., image pixels), an encoder x=fθ(s)Ckx=f_\theta(s)\in\mathbb{C}^k parameterized by neural weights θ\theta, a memoryless channel yp(yx)y\sim p(y|x) (commonly AWGN: y=x+wy=x+w, wCN(0,σ2I)w\sim\mathcal{CN}(0,\sigma^2 I)), and a decoder s^=gϕ(y)Rn\hat{s}=g_\phi(y)\in\mathbb{R}^n with neural weights ϕ\phi. The channel bandwidth ratio ρ=k/n\rho=k/n is either fixed or adaptively chosen. The system is subject to a per-symbol power constraint Es[fθ(s)2]kP\mathbb{E}_s[\|f_\theta(s)\|^2]\leq kP (Xu et al., 2022).

Training seeks the minimization: minθ,ϕ Esp(s),yp(yfθ(s))[d(s, gϕ(y))]\min_{\theta,\phi}~\mathbb{E}_{s\sim p(s),\,y\sim p(y|f_\theta(s))}[\,d(s,~g_\phi(y))\,] subject to the power constraint, with distortion dd typically mean squared error (2\|\cdot\|^2), but perceptual/semantic metrics (e.g., LPIPS, MS-SSIM) are also employed.

2. Network Architectures and Adaptive Extensions

Early DeepJSCC realizations employed convolutional autoencoders with symmetric encoder-decoder configurations. Example: five convolutional layers, increasing channels and down-sampling for the encoder, ending in a tanh-activated layer reshaped to kk complex symbols. The decoder mirrors the process with transposed convolutions (Xu et al., 2022).

Bandwidth-adaptive DeepJSCC architectures divide the kk symbols into LL layers, where training randomly selects subset sizes {1,...,L}\ell\in\{1,...,L\} so the network learns to prioritize dimensions. Attention-Fusion (AF) blocks and Feature-Learning (FL) blocks enhance robustness to SNR/channel-state variations by injecting instantaneous CSI or SNR as attention inputs (Xu et al., 2022).

Representative Encoder (image, 32×32×3, kk complex symbols):

Layer Channels Kernel/Stride Activation
conv1 3→64 3×3/1 ReLU
conv2 64→128 3×3/2 ReLU
conv3 128→256 3×3/2 ReLU
conv4 256→512 3×3/2 ReLU
conv5 512→2k 3×3/1 tanh

Normalization layer scales the output to satisfy average power constraints. The decoder (mirror) uses up-sampling and tanh activation.

3. Training Objectives, Semantic Losses, and Optimization

Base DeepJSCC training uses batchwise mean squared error loss,

L(θ,ϕ)=EsEys[sgϕ(y)2]L(\theta,\phi) = \mathbb{E}_{s} \mathbb{E}_{y|s} \left[ \|s - g_\phi(y)\|^2 \right]

with Adam optimizer, typical learning rate 10410^{-4}, batch size $32-64$, trained for several hundred epochs (Xu et al., 2022).

Semantic/perceptual losses augment/reweight MSE to directly target the task: L=αMSE+β(1MS-SSIM)+γLPIPSL = \alpha\,\text{MSE} + \beta\,(1-\text{MS-SSIM}) + \gamma\,\text{LPIPS} or use cross-entropy for classification/retrieval tasks.

Regularization (weight decay), SNR-adaptive training (random SNR batches), and curriculum learning (progressive complexity) further enhance robustness for non-monotonic channel conditions.

4. Performance Evaluation and Comparison

DeepJSCC exhibits substantial rate–distortion gains in practical finite blocklength scenarios compared to separation-based baselines (e.g., BPG+polar+16-QAM):

Scheme PSNR @ 0 dB PSNR @ 5 dB PSNR @ 10 dB Graceful Degradation
BPG + Polar (separated) 20.1 dB 22.5 dB 24.0 dB No (cliff effect)
DeepJSCC (per-SNR) 25.0 dB 27.8 dB 30.2 dB Yes
DeepJSCC (SNR-adaptive) 24.6 dB 27.2 dB 29.5 dB Yes

Trained DeepJSCC achieves 2–4 dB PSNR improvement; reconstructions degrade smoothly as SNR falls below training conditions without cliff effects (Xu et al., 2022). Semantic loss variants yield perceptually enhanced outputs at equivalent bandwidth. Latency for 32×32 images: ∼360 ms on CPU, ∼10 ms on GPU. Parameter count for typical encoder-decoder: ∼12.6M.

5. Semantic Communications Aspects

DeepJSCC directly aligns the feature mapping and reconstruction error to the task loss, facilitating semantic communications. For downstream tasks (retrieval, classification), loss functions are replaced by cross-entropy or BLEU for text, and networks are extended with pre/post-processing feature extractors (Xu et al., 2022).

Semantic metrics—LPIPS, MS-SSIM, cross-entropy—serve as optimization targets. The system naturally prioritizes task-relevant features that are most robust to channel perturbations.

6. Complexity, Latency, and Robustness Considerations

Network complexity for low-resolution images is modest (∼12.6M parameters, ∼1.5B FLOPs/image), and scaling to higher resolutions increases costs linearly. DeepJSCC implementations are highly parallelizable. End-to-end latency (encoding + decoding + simulation) remains competitive with digital baselines; constraint enforcement via power-normalization introduces negligible overhead (Xu et al., 2022).

Peak-to-average power ratio (PAPR) is a practical concern when employing OFDM; integration with waveform-level design is an open challenge.

7. Future Research Directions

Highlighted open problems include:

  • Integrated security: source-domain encryption (joint protection and DeepJSCC [DeepJESCC]), symbol-domain public-key encryption via LWE (DeepJSCEC, IND-CPA security) (Xu et al., 2022).
  • Universal multimodal architectures: DeepJSCC extension beyond images (video, text, CSI), multi-modal neural encoders.
  • Multi-user and distributed settings: DeepJSCC in broadcast, MAC, interference channels where Shannon separation breaks down.
  • Training on nonstandard/unknown channels (e.g., underwater, optical) via pure data-driven learning.
  • Device-level adaptation: efficient on-device retraining for nonstationary sources/channels.
  • Semantic–pragmatic extension: reinforcement learning environments where messages become actions (integration with RL).
  • Joint waveform/network design for PAPR reduction, OFDM optimization.

These directions are critical for deployment in next-generation semantic and ultra-reliable low-latency communication networks.


DeepJSCC embodies an end-to-end deep learning approach for joint source-channel coding, optimizing the complete communications pipeline in a task-driven fashion. By replacing separated coding stages with a single autoencoder architecture, DeepJSCC achieves graceful degradation, semantic adaptivity, low latency, and parallel implementation, establishing itself as a foundational technology for the forthcoming generation of semantic communication systems (Xu et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Deep Joint Source-Channel Coding (DeepJSCC).