Adaptive Semantic Communication for Wireless Image Transmission Leveraging Mixture-of-Experts Mechanism

Published 3 Apr 2026 in cs.LG | (2604.02691v1)

Abstract: Deep learning based semantic communication has achieved significant progress in wireless image transmission, but most existing schemes rely on fixed models and thus lack robustness to diverse image contents and dynamic channel conditions. To improve adaptability, recent studies have developed adaptive semantic communication strategies that adjust transmission or model behavior according to either source content or channel state. More recently, MoE-based semantic communication has emerged as a sparse and efficient adaptive architecture, although existing designs still mainly rely on single-driven routing. To address this limitation, we propose a novel multi-stage end-to-end image semantic communication system for multi-input multi-output (MIMO) channels, built upon an adaptive MoE Swin Transformer block. Specifically, we introduce a dynamic expert gating mechanism that jointly evaluates both real-time CSI and the semantic content of input image patches to compute adaptive routing probabilities. By selectively activating only a specialized subset of experts based on this joint condition, our approach breaks the rigid coupling of traditional adaptive methods and overcomes the bottlenecks of single-driven routing. Simulation results indicate a significant improvement in reconstruction quality over existing methods while maintaining the transmission efficiency.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper presents a novel adaptive semantic communication system leveraging a Mixture-of-Experts mechanism for efficient wireless image transmission over MIMO channels.
It employs a dynamic gating mechanism that jointly evaluates channel state information and semantic content, enhancing reconstruction fidelity and spectral efficiency.
Experimental evaluations show superior PSNR/LPIPS performance and balanced expert utilization, validating the method's adaptability under diverse transmission conditions.

Adaptive Semantic Communication with Mixture-of-Experts for Wireless Image Transmission

Introduction

Semantic communication represents a paradigm shift in wireless transmission, focusing on the preservation and effective conveyance of underlying semantic information rather than on the reliable bitwise transmission advocated by classical Shannon theory. Within the context of image transmission over MIMO fading channels, deep learning-based semantic communication frameworks have demonstrated significant promise, with successive architectures evolving from CNN-based models to powerful Transformer variants. However, fixed-model designs are inherently limited in their adaptability to varying image content and highly dynamic channel conditions, often resulting in degraded generalization and suboptimal resource utilization.

While recent work has begun to explore adaptive semantic communication—typically by incorporating either content- or channel-adaptive modules—these systems tend to be rigid, incurring computational overhead due to the direct fusion of side information with features and lacking scalability. Mixture-of-Experts (MoE) architectures have recently been introduced to this domain, offering sparse, efficient computation. Nevertheless, existing MoE systems still suffer from single-driven expert selection, considering either semantics or channel characteristics, but not both jointly.

Proposed Architecture and Adaptive Routing Mechanism

The presented work advances the field through an end-to-end wireless image semantic communication system tailored for MIMO fading channels, employing an Adaptive MoE Swin Transformer (AD-MoE ST) backbone. The innovation centers on a dynamic gating mechanism within the MoE layer that makes expert activation decisions by jointly evaluating real-time Channel State Information (CSI) and the semantic content of each input image patch. This joint-driven routing strategy explicitly breaks from traditional rigid couplings and single-driven designs, enhancing adaptability across a wide range of transmission scenarios.

The architectural pipeline consists of:

Convolutional feature extractors for the initial two stages to operate at higher image resolutions.
Two subsequent stages implementing stacked AD-MoE Swin Transformer blocks, efficiently extracting features at reduced resolutions.

Each AD-MoE ST block incorporates both multi-head self-attention (alternating standard and shifted windows) and a novel MoE-based feedforward layer (Figure 1). The feedforward consists of shared experts (always active) and routed experts (selectively activated), with sparsity and routing diversity achieved via a bespoke gating module. The router computes expert selection probabilities based on concatenated semantic features and CSI, and an adaptive selection process ensures only experts with near-maximal relevance are activated—subject to a difference-threshold criterion and capped by a sparsity hyperparameter $k$ .

Figure 2: The overall structure of the proposed scheme for wireless image transmission.

Figure 1: (a) Architecture of two successive AD-MoE STBlocks. (b) Architecture of AD-MoE MLP Block with routed and shared experts.

Power-normalized patch-level features are projected to the MIMO channel for transmission. The symmetric decoder reconstructs images from received latent codes, closely mirroring the encoder design. The bandwidth ratio $R = \frac{K}{H \times W \times 3}$ quantifies spectral efficiency.

Training Objectives and Expert Regularization

The loss function integrates standard mean squared error (MSE) for reconstruction fidelity with a three-part regularization scheme aimed at ensuring robustness and diversity in expert utilization:

Load balancing loss: Mitigates expert collapse by promoting uniform activation frequencies among routed experts.
Entropy regularization: Encourages stochastic and diverse expert routing.
Variance penalty: Reduces the systematic underutilization of any particular expert.

This combination preserves the efficiency and adaptability of the MoE structure, especially under non-stationary input and channel distributions.

Experimental Evaluation

Extensive experiments were conducted using the DIV2K training dataset and the Kodak dataset for evaluation. The proposed AD-MoE Swin Transformer based system was compared against DeepJSCC and the canonical SwinJSCC under multiple MIMO scenarios and varying SNR/bandwidth constraints. Performance was assessed primarily via PSNR and LPIPS metrics.

Key findings include:

Superior performance under all tested SNRs and antenna configurations, with the proposed model surpassing DeepJSCC by considerable PSNR and LPIPS margins and outperforming SwinJSCC at both low and high bandwidth ratios (Figure 3).
Qualitative image reconstructions demonstrate marked visual improvements over benchmarks, especially under adverse channel fading conditions (Figure 4).
Ablation evidence: Adding AD-MoE blocks notably enhances image fidelity and perceptual quality compared to baseline SwinJSCC, directly attributing benefits to the adaptive MoE routing strategy.
Figure 3: PSNR and LPIPS curves for different methods under varying SNR and bandwidth ratio over MIMO fading channels.

Figure 4: Example reconstructions at SNR = 10 dB, R = 0.0833, $N_t = 8$ , $N_r = 8$ using DeepJSCC, SwinJSCC, and the proposed scheme.

Expert activation frequency analysis across 60 CLIC2021 images reveals that all routed experts receive substantial and relatively balanced activation, with an average of approximately 1.68 experts per image transmission. This validates the model’s ability to adaptively scale its representational capacity according to task demand and channel state while avoiding over-specialization or collapse (Figure 5).

Figure 5: Expert activation frequency in the final encoder AD-MoE STBlock on the CLIC2021 dataset.

Implications and Future Prospects in AI-driven Communication

This work substantiates that joint routing strategies—blending semantic and channel-driven stimuli—markedly enhance the efficiency and robustness of semantic communication systems. The proposed AD-MoE mechanism sets a new precedent for expert sparsity, routing diversity, and architectural capacity scaling in bandwidth- and resource-constrained wireless environments.

Practical implications span:

Scalable edge deployment: Highly dynamic adaptation to hardware, bandwidth, or channel environment makes the method suitable for edge and IoT applications.
Task-aware adaptation: The model can be extended to multi-task scenarios, leveraging the modularity of the MoE for joint vision-language communications.
Interoperability with quantization and feedback: The architecture is compatible with quantization, coding, or feedback modules for further efficiency gains.

Future research directions may focus on integrating multi-modal semantic tasks, cross-domain generalization, or combining with state-space and diffusion architectures for even greater robustness. Exploration of online adaptation, low-latency inference, and federated or decentralized semantic communication systems would align well with the demonstrated adaptability of the proposed routing mechanism.

Conclusion

By introducing a jointly adaptive MoE mechanism into a Swin Transformer backbone, this study delivers a highly efficient, scalable semantic communication system for wireless image transmission over MIMO fading channels. Dynamic expert selection informed by both semantic and channel state features offers substantial improvements in reconstruction fidelity and resource utilization, overcoming the rigidity of fixed and single-driven adaptive models. The results suggest compelling avenues for advancing AI-driven, context-aware wireless communication frameworks.

Markdown Report Issue