Generative Semantic Communications
- Generative semantic communications are advanced systems that leverage AI to extract and transmit only task-relevant semantics, reducing bandwidth consumption.
- They employ powerful generative models like transformers and diffusion models to reconstruct and calibrate content across modalities while minimizing semantic distortion.
- These systems enable mobile–edge–cloud collaboration, fusing context reasoning, semantic JSCC, and automated knowledge provisioning to achieve significant resource savings and improved perceptual quality.
Generative semantic communications (GSC) represent a paradigm shift in the design of communication systems, in which the delivery objective transitions from transmitting bit-accurate data to conveying meaning and enabling information regeneration at the receiver. By integrating advanced generative artificial intelligence (GAI) models—particularly large-scale generative models—into the semantic communication loop, GSC systems achieve highly efficient, robust, and flexible content delivery across modalities such as text, images, video, and speech. GSC leverages context reasoning, automated background-knowledge provisioning, and AI-driven joint source–channel coding (JSCC) to transmit only the core task-relevant semantics, allowing the receiver to reconstruct, regenerate, or calibrate content, thereby minimizing semantic distortion and drastically reducing resource consumption while preserving meaning (Xia et al., 2023, Ren et al., 2024, Yuan et al., 21 Apr 2025, Qin et al., 11 Nov 2025).
1. Fundamental Principles and Core Concepts
At its foundation, generative semantic communication departs from the conventional Shannon paradigm, which focuses on bit-level reliability, and re-orients communication toward semantic fidelity. In semantic communication (SemCom), a transmitter equipped with background knowledge extracts only the "core semantic features" of a source message (S), which are then transmitted. The receiver, possessing an equivalent knowledge base, decodes and reconstructs the intended meaning, permitting bit errors if they do not increase the semantic distortion function (Xia et al., 2023). GSC generalizes this framework by integrating GAI models into each phase:
- Automated knowledge provisioning: GAI constructs global and personalized knowledge, ensuring knowledge alignment across nodes.
- Context reasoning and prompt-driven inference: Local and cloud-based GAI models extract minimal semantic prompts (keywords and communication goals), which drive both uplink traffic reduction and semantic-level JSCC.
- Multimodal generation and calibration: Powerful generative models (e.g., diffusion models, transformer architectures) at the receiver reconstruct or refine semantic payloads, correcting errors and inpainting missing information in a meaning-preserving manner (Xia et al., 2023, Ren et al., 2024, Qin et al., 11 Nov 2025).
Unlike conventional SemCom, GSC shifts the decoder's role from "information recovery" to "information regeneration," with the receiver able to synthesize the desired output content directly from compact semantic codes, bypassing direct reconstruction of the original bitstream (Ren et al., 2024).
2. System Architectures and Algorithmic Designs
GSC system architectures commonly follow a cloud-edge-mobile collaborative stack, embedding both global and personalized knowledge bases and hierarchies of GAI models (Xia et al., 2023, Yuan et al., 21 Apr 2025, Ren et al., 2024). A representative workflow incorporates the following components:
- Mobile Layer (Terminal Devices): Local lightweight GAI (e.g., GPT-Neo) for on-device keyword extraction, goal identification, and post-decoding semantic calibration.
- Edge Layer (BS/MEC servers): Semantic JSCC encoders/decoders, channel-aware feature encoding, and edge-offloaded semantic processing.
- Cloud Layer: Large pre-trained GAI models (e.g., GPT-4, Stable Diffusion), responsible for pre-training, fine-tuning, AIGC (Artificial Intelligence-Generated Content) generation, and knowledge repository management.
Typical Data Flow (Xia et al., 2023):
- Source message arrives at TD transmitter.
- Local GAI extracts minimal prompts.
- Uplink: prompts (+ goal) are transmitted to cloud GAI.
- Cloud GAI generates full-fidelity AIGC.
- Edge encodes AIGC using semantic JSCC and transmits compact semantic features.
- TD receiver reconstructs semantics and executes local GAI-based calibration.
This architecture allows for multimodal flexibility (text/image/video), dramatic bandwidth reduction (e.g., by transmitting only a handful of keywords), and robust, meaning-preserving content reconstruction even in harsh channel conditions (Xia et al., 2023, Ma et al., 24 Sep 2025, Ren et al., 2024, Grassucci et al., 2023).
3. Mathematical Frameworks and Joint Source–Channel Coding
Semantic-level JSCC in GSC is fundamentally guided by minimizing expected semantic distortion under channel and rate constraints:
where and represent semantic encoder/decoder, semantic content, channel state, and the bit-rate or transmission cost (Xia et al., 2023). GAI-provisioned context (keywords, user profiles) is ingested into to inform task-oriented semantic compression.
At the receiver, preliminary semantic features are reconstructed and then refined via generative AI (e.g., diffusion-based inpainting, transformer-based hallucination), further reducing semantic distortion and enhancing perceptual realism (PSNR/SSIM gains) (Ma et al., 24 Sep 2025, Ren et al., 2024, Qin et al., 11 Nov 2025).
Model Optimization: GSC architectures frequently entail end-to-end training objectives that balance semantic distortion, communication rate, and adversarial/perceptual losses:
0
(Ma et al., 24 Sep 2025, Ren et al., 2024, Yuan et al., 21 Apr 2025)
Semantic importance of different features or prompts enables semantic-aware resource allocation, as detailed in GSC power allocation frameworks (Xu et al., 2024, Xu et al., 2024), which adapt transmission power and coding based on the semantic value and perceptual contribution of each stream to minimize energy usage while maintaining fidelity.
4. Generative Modeling Techniques for Semantic Content Regeneration
GSC exploits a spectrum of generative AI architectures:
- Variational Autoencoders (VAEs): Compact latent variable models enabling dimension adaptation, semantic rate distortion, and robust noisy channel handling (Ren et al., 2024, Barbarossa et al., 2023).
- Generative Adversarial Networks (GANs): High-fidelity reconstruction, semantic style transfer, and strong adversarial robustness (Ren et al., 2024).
- Diffusion Models: Denoising score-based architectures allowing for controllable, conditional, and classifier-free guided semantic reconstruction. These models underpin state-of-the-art human-centric, machine-centric, and intent-centric GSC systems, achieving robust semantic preservation, low bitrates (as low as 0.003 bpp), and stable training dynamics (Ren et al., 2024, Grassucci et al., 2023, Qin et al., 11 Nov 2025, Ma et al., 24 Sep 2025, Du et al., 2023, Fu et al., 2024).
Conditional diffusion models support guided regeneration based on auxiliary semantic cues (segmentation masks, textual prompts, etc.), allowing fine-grained control over the generated output and efficient adaptation to new domains (Qin et al., 11 Nov 2025, Grassucci et al., 2023). Lightweight deployment strategies (quantization, LoRA) enable the practical use of large GAI models on resource-constrained devices (Ma et al., 24 Sep 2025).
5. Performance Characteristics and Empirical Results
Extensive empirical studies demonstrate the superior efficiency and fidelity of GSC over conventional and early semantic communication baselines.
Bit-Efficiency and Fidelity:
- GAI-SCN achieves ~50% reduction in transmitted bits versus traditional SemCom and ~76% versus non-semantic communication, with slight improvements in PSNR (from 28.05 dB to 28.64 dB) (Xia et al., 2023).
- LLM-based generative SemCom for video retrieval realizes a 99.98% reduction in communication overhead (0.036 Mb vs. 219 Mb) and a 53% improvement in retrieval accuracy (93.03% vs. 39.39%) relative to MPEG+LDPC+QAM (Ren et al., 2024).
- Diffusion-based GSC frameworks in image transmission yield PSNR/SSIM/LPIPS gains and consistently outperform CNN-based DeepJSCC, BPG+LDPC, and vanilla SemCom, even in severe noise regimes (Ma et al., 24 Sep 2025, Grassucci et al., 2023, Zhang et al., 2024, Li et al., 2024).
Multimodal and Scalable Deployment: GSC natively supports multimodal content (text, image, video, speech), enables mobile–edge–cloud collaborative architectures, and scales to resource-constrained and multi-user scenarios through model quantization, incremental prompt transmission, and asynchronous scheduling (Xia et al., 2023, Ma et al., 24 Sep 2025, Ren et al., 2024, Zhang et al., 2024).
Table: Example Quantitative Gains (Image Communication, AWGN Channel)
| Scheme | Bits per 300 img | PSNR (dB) | Overhead Reduction |
|---|---|---|---|
| Traditional+GAI | 1.28 × 10⁵ | 28.05 | – |
| Pure SemCom | 5.99 × 10⁴ | 28.25 | ~53% |
| GAI-SCN (proposed) | 3.03 × 10⁴ | 28.64 | ~76% |
These improvements are underpinned by the strategy to transmit only task- or meaning-relevant sub-symbolic or symbolic representations (e.g., prompts, semantic maps), using downstream GAI models to reconstruct plausible, detailed content (Yuan et al., 21 Apr 2025, Ren et al., 2024, Qin et al., 11 Nov 2025).
6. Open Challenges and Future Directions
Several core challenges for GSC remain under active investigation:
- Model Complexity, Computation, and Latency: Large GAI and diffusion models strain device resources and increase end-to-end latency. Model compression, pruning, and split-computation (cloud-edge) architectures are leading mitigation strategies (Xia et al., 2023, Ma et al., 24 Sep 2025, Ren et al., 2024, Qin et al., 11 Nov 2025).
- Semantic Uncertainty and Generation Control: AIGC outputs can exhibit randomness and prompt instability; bounding this with controlled generation and prompt-engineering is an open problem (Xia et al., 2023, Ren et al., 2024).
- Knowledge Synchronization: Ensuring transmitter–receiver knowledge alignment, especially with evolving personalized data, requires new protocols for prompt synchronization, secure model updates, and federated learning (Xia et al., 2023, Yuan et al., 21 Apr 2025).
- Security and Privacy: Semantic coding with powerful GAI models opens new attack surfaces (e.g., prompt leakage, membership inference, model poisoning). Solutions involve privacy-preserving training, blockchain-based accountability, and secure channel/semantic coding (Du et al., 2023, Ren et al., 2024).
- Semantic Metrics and Standards: There is a lack of standardized, well-accepted semantic fidelity and perceptual quality metrics tailored to regenerated content, complicating system evaluation and adoption (Xia et al., 2023, Ren et al., 2024, Yuan et al., 21 Apr 2025).
- Joint Physical–Semantic Layer Design: Integrating physical-layer constraints (MIMO, power allocation, adaptive beamforming) with semantic-level metrics to unlock further resource savings in dynamic, multi-user, or adversarial environments is an open challenge (Yuan et al., 21 Apr 2025, Xu et al., 2024, Xu et al., 2024, Qin et al., 11 Nov 2025).
7. Representative Applications and Case Studies
Generative semantic communications are enabling a wide spectrum of next-generation applications:
- Industrial IoT and V2X: Task-driven communication and situational awareness using semantic prompts and AI-guided reconstruction (Ren et al., 2024, Yuan et al., 21 Apr 2025).
- Metaverse and XR: Ultra-low-rate, low-latency transmission of semantic cues to synthesize immersive experiences at the receiver (Ren et al., 2024, Ren et al., 2024, Qin et al., 11 Nov 2025).
- Remote Monitoring, Video Retrieval, and Surveillance: Receiver-centric GSC architectures allow the receiver to request and obtain only the desired semantic information, dramatically reducing transmission and processing load (Liu et al., 2024, Yang et al., 2023).
- Autonomous Driving and Smart Cities: Efficient sharing of only the salient semantic features supports bandwidth-constrained, safety-critical communications (Liang et al., 2023).
- Text-to-Speech and Multimodal Synthesis: Hierarchical semantic knowledge bases and diffusion-based generative decoding yield significantly higher fidelity than traditional or autoencoder-based baselines in AWGN and Rayleigh scenarios (Zheng et al., 2024).
In summary, generative semantic communications couple the compactness and robustness of semantic-level representation with the powerful reconstruction capabilities of state-of-the-art generative models, establishing a foundation for ultra-efficient, meaning-centric, and robust multimodal communications in emerging wireless and distributed intelligence systems (Xia et al., 2023, Ren et al., 2024, Ma et al., 24 Sep 2025, Qin et al., 11 Nov 2025, Liang et al., 2023, Yuan et al., 21 Apr 2025).