Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generative Semantic Communications

Updated 1 June 2026
  • Generative semantic communications are advanced systems that leverage AI to extract and transmit only task-relevant semantics, reducing bandwidth consumption.
  • They employ powerful generative models like transformers and diffusion models to reconstruct and calibrate content across modalities while minimizing semantic distortion.
  • These systems enable mobile–edge–cloud collaboration, fusing context reasoning, semantic JSCC, and automated knowledge provisioning to achieve significant resource savings and improved perceptual quality.

Generative semantic communications (GSC) represent a paradigm shift in the design of communication systems, in which the delivery objective transitions from transmitting bit-accurate data to conveying meaning and enabling information regeneration at the receiver. By integrating advanced generative artificial intelligence (GAI) models—particularly large-scale generative models—into the semantic communication loop, GSC systems achieve highly efficient, robust, and flexible content delivery across modalities such as text, images, video, and speech. GSC leverages context reasoning, automated background-knowledge provisioning, and AI-driven joint source–channel coding (JSCC) to transmit only the core task-relevant semantics, allowing the receiver to reconstruct, regenerate, or calibrate content, thereby minimizing semantic distortion and drastically reducing resource consumption while preserving meaning (Xia et al., 2023, Ren et al., 2024, Yuan et al., 21 Apr 2025, Qin et al., 11 Nov 2025).

1. Fundamental Principles and Core Concepts

At its foundation, generative semantic communication departs from the conventional Shannon paradigm, which focuses on bit-level reliability, and re-orients communication toward semantic fidelity. In semantic communication (SemCom), a transmitter equipped with background knowledge extracts only the "core semantic features" of a source message (S), which are then transmitted. The receiver, possessing an equivalent knowledge base, decodes and reconstructs the intended meaning, permitting bit errors if they do not increase the semantic distortion function DsemD_\text{sem} (Xia et al., 2023). GSC generalizes this framework by integrating GAI models into each phase:

  • Automated knowledge provisioning: GAI constructs global and personalized knowledge, ensuring knowledge alignment across nodes.
  • Context reasoning and prompt-driven inference: Local and cloud-based GAI models extract minimal semantic prompts (keywords and communication goals), which drive both uplink traffic reduction and semantic-level JSCC.
  • Multimodal generation and calibration: Powerful generative models (e.g., diffusion models, transformer architectures) at the receiver reconstruct or refine semantic payloads, correcting errors and inpainting missing information in a meaning-preserving manner (Xia et al., 2023, Ren et al., 2024, Qin et al., 11 Nov 2025).

Unlike conventional SemCom, GSC shifts the decoder's role from "information recovery" to "information regeneration," with the receiver able to synthesize the desired output content x~\tilde{x} directly from compact semantic codes, bypassing direct reconstruction of the original bitstream (Ren et al., 2024).

2. System Architectures and Algorithmic Designs

GSC system architectures commonly follow a cloud-edge-mobile collaborative stack, embedding both global and personalized knowledge bases and hierarchies of GAI models (Xia et al., 2023, Yuan et al., 21 Apr 2025, Ren et al., 2024). A representative workflow incorporates the following components:

  • Mobile Layer (Terminal Devices): Local lightweight GAI (e.g., GPT-Neo) for on-device keyword extraction, goal identification, and post-decoding semantic calibration.
  • Edge Layer (BS/MEC servers): Semantic JSCC encoders/decoders, channel-aware feature encoding, and edge-offloaded semantic processing.
  • Cloud Layer: Large pre-trained GAI models (e.g., GPT-4, Stable Diffusion), responsible for pre-training, fine-tuning, AIGC (Artificial Intelligence-Generated Content) generation, and knowledge repository management.

Typical Data Flow (Xia et al., 2023):

  1. Source message arrives at TD transmitter.
  2. Local GAI extracts minimal prompts.
  3. Uplink: prompts (+ goal) are transmitted to cloud GAI.
  4. Cloud GAI generates full-fidelity AIGC.
  5. Edge encodes AIGC using semantic JSCC and transmits compact semantic features.
  6. TD receiver reconstructs semantics and executes local GAI-based calibration.

This architecture allows for multimodal flexibility (text/image/video), dramatic bandwidth reduction (e.g., by transmitting only a handful of keywords), and robust, meaning-preserving content reconstruction even in harsh channel conditions (Xia et al., 2023, Ma et al., 24 Sep 2025, Ren et al., 2024, Grassucci et al., 2023).

3. Mathematical Frameworks and Joint Source–Channel Coding

Semantic-level JSCC in GSC is fundamentally guided by minimizing expected semantic distortion Dsem(S,S^)D_\text{sem}(S, \hat{S}) under channel and rate constraints:

minfθ,gϕES,H[Dsem(S,gϕ(fθ(S),H))]subject toR(fθ)Rmax\min_{f_\theta, g_\phi} \mathbb{E}_{S, H}\left[ D_\text{sem}\left(S, g_\phi( f_\theta(S), H )\right) \right] \quad\text{subject to}\quad R(f_\theta) \leq R_\text{max}

where fθf_\theta and gϕg_\phi represent semantic encoder/decoder, SS semantic content, HH channel state, and R()R(\cdot) the bit-rate or transmission cost (Xia et al., 2023). GAI-provisioned context (keywords, user profiles) is ingested into fθf_\theta to inform task-oriented semantic compression.

At the receiver, preliminary semantic features are reconstructed and then refined via generative AI (e.g., diffusion-based inpainting, transformer-based hallucination), further reducing semantic distortion and enhancing perceptual realism (PSNR/SSIM gains) (Ma et al., 24 Sep 2025, Ren et al., 2024, Qin et al., 11 Nov 2025).

Model Optimization: GSC architectures frequently entail end-to-end training objectives that balance semantic distortion, communication rate, and adversarial/perceptual losses:

x~\tilde{x}0

(Ma et al., 24 Sep 2025, Ren et al., 2024, Yuan et al., 21 Apr 2025)

Semantic importance of different features or prompts enables semantic-aware resource allocation, as detailed in GSC power allocation frameworks (Xu et al., 2024, Xu et al., 2024), which adapt transmission power and coding based on the semantic value and perceptual contribution of each stream to minimize energy usage while maintaining fidelity.

4. Generative Modeling Techniques for Semantic Content Regeneration

GSC exploits a spectrum of generative AI architectures:

Conditional diffusion models support guided regeneration based on auxiliary semantic cues (segmentation masks, textual prompts, etc.), allowing fine-grained control over the generated output and efficient adaptation to new domains (Qin et al., 11 Nov 2025, Grassucci et al., 2023). Lightweight deployment strategies (quantization, LoRA) enable the practical use of large GAI models on resource-constrained devices (Ma et al., 24 Sep 2025).

5. Performance Characteristics and Empirical Results

Extensive empirical studies demonstrate the superior efficiency and fidelity of GSC over conventional and early semantic communication baselines.

Bit-Efficiency and Fidelity:

  • GAI-SCN achieves ~50% reduction in transmitted bits versus traditional SemCom and ~76% versus non-semantic communication, with slight improvements in PSNR (from 28.05 dB to 28.64 dB) (Xia et al., 2023).
  • LLM-based generative SemCom for video retrieval realizes a 99.98% reduction in communication overhead (0.036 Mb vs. 219 Mb) and a 53% improvement in retrieval accuracy (93.03% vs. 39.39%) relative to MPEG+LDPC+QAM (Ren et al., 2024).
  • Diffusion-based GSC frameworks in image transmission yield PSNR/SSIM/LPIPS gains and consistently outperform CNN-based DeepJSCC, BPG+LDPC, and vanilla SemCom, even in severe noise regimes (Ma et al., 24 Sep 2025, Grassucci et al., 2023, Zhang et al., 2024, Li et al., 2024).

Multimodal and Scalable Deployment: GSC natively supports multimodal content (text, image, video, speech), enables mobile–edge–cloud collaborative architectures, and scales to resource-constrained and multi-user scenarios through model quantization, incremental prompt transmission, and asynchronous scheduling (Xia et al., 2023, Ma et al., 24 Sep 2025, Ren et al., 2024, Zhang et al., 2024).

Table: Example Quantitative Gains (Image Communication, AWGN Channel)

Scheme Bits per 300 img PSNR (dB) Overhead Reduction
Traditional+GAI 1.28 × 10⁵ 28.05
Pure SemCom 5.99 × 10⁴ 28.25 ~53%
GAI-SCN (proposed) 3.03 × 10⁴ 28.64 ~76%

(Xia et al., 2023)

These improvements are underpinned by the strategy to transmit only task- or meaning-relevant sub-symbolic or symbolic representations (e.g., prompts, semantic maps), using downstream GAI models to reconstruct plausible, detailed content (Yuan et al., 21 Apr 2025, Ren et al., 2024, Qin et al., 11 Nov 2025).

6. Open Challenges and Future Directions

Several core challenges for GSC remain under active investigation:

7. Representative Applications and Case Studies

Generative semantic communications are enabling a wide spectrum of next-generation applications:

  • Industrial IoT and V2X: Task-driven communication and situational awareness using semantic prompts and AI-guided reconstruction (Ren et al., 2024, Yuan et al., 21 Apr 2025).
  • Metaverse and XR: Ultra-low-rate, low-latency transmission of semantic cues to synthesize immersive experiences at the receiver (Ren et al., 2024, Ren et al., 2024, Qin et al., 11 Nov 2025).
  • Remote Monitoring, Video Retrieval, and Surveillance: Receiver-centric GSC architectures allow the receiver to request and obtain only the desired semantic information, dramatically reducing transmission and processing load (Liu et al., 2024, Yang et al., 2023).
  • Autonomous Driving and Smart Cities: Efficient sharing of only the salient semantic features supports bandwidth-constrained, safety-critical communications (Liang et al., 2023).
  • Text-to-Speech and Multimodal Synthesis: Hierarchical semantic knowledge bases and diffusion-based generative decoding yield significantly higher fidelity than traditional or autoencoder-based baselines in AWGN and Rayleigh scenarios (Zheng et al., 2024).

In summary, generative semantic communications couple the compactness and robustness of semantic-level representation with the powerful reconstruction capabilities of state-of-the-art generative models, establishing a foundation for ultra-efficient, meaning-centric, and robust multimodal communications in emerging wireless and distributed intelligence systems (Xia et al., 2023, Ren et al., 2024, Ma et al., 24 Sep 2025, Qin et al., 11 Nov 2025, Liang et al., 2023, Yuan et al., 21 Apr 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generative Semantic Communications.