Semantic Communication for Cooperative Swarms

Updated 24 November 2025

Semantic communication for cooperative swarms is a paradigm where agents share compressed, task-relevant data to optimize cooperation while minimizing bandwidth and energy costs.
The approach integrates per-agent feature extraction, semantic encoding/decoding, and adaptive physical-layer strategies such as OFDM and semantic HARQ to overcome wireless impairments.
Modular fusion techniques and LLM-driven protocols enable dynamic, goal-oriented coordination and multi-task cooperation across diverse sensor systems and network conditions.

Semantic communication for cooperative swarms is a paradigm in which multiple autonomous agents—such as vehicles, drones, or mobile robots—exchange only task-relevant semantic information, rather than raw sensory data, to collaboratively achieve global objectives in dynamic and resource-constrained environments. Central to this approach are methods for extracting, compressing, transmitting, and fusing distributed semantic representations over unreliable wireless links, with a focus on maximizing joint task performance, minimizing bandwidth and energy costs, and ensuring robust operation under noise, fading, and agent or topology variations.

1. System Architectures and Key Building Blocks

Semantic communication for cooperative swarms systematically organizes sensing, communication, and decision-making across agents. A general formulation (Sheng et al., 2023, Sheng et al., 2024, Razlighi et al., 2024) comprises:

Per-agent backbone: Each agent $k$ acquires local sensory observations $X^k$ (LiDAR, camera, radar, etc.) and extracts features $F^k = \Phi(X^k)$ .
Importance map encoder: Network $P(\cdot)$ produces a soft saliency mask $C^k = P(F^k)$ , yielding sparse, semantically compressed features $M^k = F^k \odot C^k$ .
Semantic encoder/decoder: A CNN-based encoder $\Psi_s(\cdot)$ maps $M^k$ into channel symbols $T^k$ (e.g., $\mathbb{C}^{H' \times W'}$ ), and the decoder $\Psi_d(\cdot)$ reconstructs $R^k \approx M^k$ from noisy receptions.
Physical layer adaptation: Transmission over AWGN, Rayleigh, or OFDM (with channel estimation and equalization) ensures resilience to time-varying fading and multipath (Sheng et al., 2023, Sheng et al., 2024).
Fusion and cooperative task head: Intermediate fusion module $\chi(\cdot)$ aggregates received features across the swarm, often via self-attention or graph neural networks, followed by a task head $\Gamma$ for final decision outputs (e.g., bounding boxes, control actions, semantic segmentation).
Extensions: Modular adaptability to multi-modal sensors, hierarchical communication rates (compression ratios $\mathrm{CR}$ ), feedback channels, and role-adaptive encoding is supported (Sheng et al., 2023, Razlighi et al., 2024, Xu et al., 2 Nov 2025).

Block Diagram (Generalized)

Raw Data (X^k)
    ↓    Φ(·)
Features (F^k)
    ↓    P(·)
Importance Mask (C^k)
    ↓    ⊙
Sparse Features (M^k)
    ↓    Ψ_s(·)
Encoded Symbols (T^k)
    ↓    Channel (AWGN/OFDM)
Received Symbols (T'^k)
    ↓    Ψ_d(·)
Reconstructed (R^k)
    ↓
Fusion χ({R^k, F^k})
    ↓
Task Head Γ
    ↓
Cooperative Output

2. Mathematical Formalism and Loss Functions

Semantic communication frameworks formalize the extraction and transmission processes as end-to-end trainable systems under explicit task-centric constraints.

Importance map and compression: Given $F^k \in \mathbb{R}^{C \times H \times W}$ , the importance map $C^k$ is enforced to have low $\mathrm{CR} = |\{(i,j) : C^k_{i,j} > 0\}| / (H \cdot W)$ , typically 1–5% (Sheng et al., 2023, Sheng et al., 2024).
End-to-end training objective:

$L_{\text{total}} = \lambda_1 L_{\text{rec}} + \lambda_2 L_{\text{per}}$

where

$L_{\text{rec}} = \frac{1}{N} \sum_i \|M^k_i - R^k_i\|_2^2$

$L_{\text{per}} = \frac{1}{N} \sum_i \text{(task-aware loss, e.g., smooth-}L_1+ \text{focal})$

Information-theoretic variants: In multi-task split-encoder CCMT architectures, mutual information maximization for distributed estimation is formalized as:

$\max_{CU, SU} \sum_{i=1}^N I(z_i; \hat X_{(1:K),i})$

with variational bounds and per-task cross-entropy or regression losses (Razlighi et al., 2024).

Adaptive optimization in resource-constrained settings: Compression ratios and transmission power are optimized to maintain target task performance under bit budget and channel constraints (Zhao et al., 8 Oct 2025, Silva et al., 2023).

3. Channel Models and Robustness to Wireless Impairments

Semantic communication systems for swarms are evaluated under diverse physical-layer challenges:

Channel types: AWGN, Rayleigh flat fading, multipath (3GPP TDL), and frequency-selective channels are integrated in the simulation and training pipeline (Sheng et al., 2023, Sheng et al., 2024, Zhao et al., 8 Oct 2025).
OFDM-based adaptation: Transmission blocks include pilot symbols for MMSE channel estimation, adaptive equalization, and application of semantic encoders robust to channel fading and Doppler (Sheng et al., 2024).
Semantic HARQ: Retransmission schemes are implemented at the semantic level via SimCRC, a Siamese ResNet+MLP predictor comparing the semantic similarity between reconstructed and reference features to trigger NACK and incremental redundancy (Sheng et al., 2024).

Channel-Adaptation Guidelines

Scenario	Channel Strategy	Robustness Feature
Static/frequency-flat	Skip OFDM, use direct mapping	Lower latency
Fast fading/multipath	OFDM + pilot-based equ./SimCRC-based HARQ	Resilience to burst errors
Bandwidth-constrained	Lower CR, higher semantic abstraction	Graceful degradation

Semantic communication with JSCC demonstrates "graceful degradation" in task metrics below band edge and absence of cliff effects typical in classical separate source-channel coding (Sheng et al., 2023, Sheng et al., 2024).

4. Fusion, Task Execution, and Multi-Task Cooperation

Semantic fusion aggregates the reconstructed representations from multiple agents, leveraging the redundancy and diversity of distributed perspectives.

Intermediate fusion: Cross-agent self-attention or graph neural network modules allow flexible aggregation of both local and received semantics (Sheng et al., 2023, Sheng et al., 2024).
Multi-task modularity: Split-encoder architectures (Common Unit + multiple Specific Units) enable simultaneous multi-task semantic communication and distributed estimation with collaborative decoding (receiver-side fusion of K noisy agent outputs per task) (Razlighi et al., 2024).
Swarm adaptation: For highly dynamic or task-varying swarms, modular approaches are adopted: the backbone or "common unit" can be generalized and frozen, with rapid adaptation of lightweight task-specific modules corresponding to new tasks, SNR regimes, or agent arrivals/departures (Razlighi et al., 2024, Xu et al., 2 Nov 2025).

5. LLM-Driven and Goal-Oriented Semantic Protocols

LLMs enable high-level semantic compression and coordination in swarms with heterogeneous platforms and sensors.

LLM-driven semantic tokenization: Raw observations are compressed into human-interpretable tokens (intent, world state, objects of interest) which serve as inputs to fuzzy-control or path-planning modules (Xu et al., 2 Nov 2025, Lin et al., 16 Aug 2025).
Prompt-based execution: System and instruction prompts bring semantic consistency, while role-adaptive compression (commander, relay, executor) ensures task-directed communication under complex, multi-hop, and bandwidth-limited topologies (Lin et al., 16 Aug 2025).
Token-based protocols: Agents exchange short packets of semantic tokens summarizing observations, intents, and sub-goal assignments (regions to explore or actions to execute). Robustness is achieved through checksums, majority voting, and prioritizing intent/context over raw data to reduce channel load (Xu et al., 2 Nov 2025, White et al., 2019).

6. Explicit Performance Metrics, Simulation Results, and Design Principles

Semantic communication for cooperative swarms has been evaluated across several research sources along key metrics:

Metric	Typical Source	Definition/Range
Compression Ratio (CR)	(Sheng et al., 2023, Lin et al., 16 Aug 2025)	Nonzero fraction of spatial/temporal regions transmitted (1–5%, 0.24–0.7)
Task Accuracy (e.g., AP)	(Sheng et al., 2023, Sheng et al., 2024)	3D object detection, AP at specified IoU ([email protected], [email protected])
Semantic Preservation (SP)	(Lin et al., 16 Aug 2025)	BERTScore or cross-entropy between raw and compressed instructions
Joint cost (energy/radio)	(Silva et al., 2023)	$E_{\text{total}} = \alpha \bar B + \beta \bar J + \gamma$
Path Planning Accuracy	(Zhao et al., 8 Oct 2025)	$Q(\delta) = P\{p^* = \hat{p}^*\}$ , path-weight error, feasibility
Coverage Efficiency	(Xu et al., 2 Nov 2025)	OOI coverage ratio, density, efficiency

Results across these works show that:

Semantic+JSCC transmission outperforms digital schemes (LDPC+QAM) especially in low SNR and under varying wireless conditions, without catastrophic failures at decoding thresholds (Sheng et al., 2023, Sheng et al., 2024).
Partial semantic maps suffice for most tasks; high redundancy allows operation at high compression with minimal loss in cooperative function (Sheng et al., 2023, Sheng et al., 2024).
Multi-task and modular schemes (e.g., CCMT) generalize well to SNR/runtime variations, scaling efficiently with parameters and providing lower error than single-task baselines at equivalent complexity (Razlighi et al., 2024).
Semantic-Functional architectures minimize energy costs by maximizing functional coverage per bit; event-triggered transmission achieves near-explicit performance with ≪10% radio energy (Silva et al., 2023).
LLM-based pipelines demonstrate success rates exceeding 0.93 in complex, bandwidth-constrained, multi-hop rescue scenarios (with SP/CR tradeoff) (Lin et al., 16 Aug 2025).

7. Extension Principles and Practical Guidelines

Comprehensive guidelines for implementing semantic communication in swarms arise across the literature:

Match feature backbone and task head to agent's sensor and mission (e.g., LiDAR/PointPillars for detection, ViT for path planning) (Sheng et al., 2023, Zhao et al., 8 Oct 2025).
Optimize compression ratio per link/time/tasked agent: dynamically adapt masking or sparsity to link capacity, task priority, or betweenness in the collective plan (Sheng et al., 2023, Zhao et al., 8 Oct 2025).
Leverage modular training: pretrain global backbones, fine-tune task-specific heads, and adapt to new tasks or team members by swapping/adding appropriate network modules (Razlighi et al., 2024).
Integrate physical layer and semantic objectives: co-design encoder-decoder and channel simulator for differentiability, regularize with both bit-level and task-specific (semantic) losses (Sheng et al., 2023, Sheng et al., 2024, Zhao et al., 8 Oct 2025).
Adopt decentralized, role-adaptive, or event-driven protocols, using LLMs or goal-based messaging to further compress and prioritize only the most action-critical information (Xu et al., 2 Nov 2025, Lin et al., 16 Aug 2025, Silva et al., 2023).

References

Core sources:

Semantic encoding with importance maps and JSCC: (Sheng et al., 2023, Sheng et al., 2024)
Multi-task split-encoder/decoder for distributed sources: (Razlighi et al., 2024)
LLM-based semantic compression and execution: (Lin et al., 16 Aug 2025, Xu et al., 2 Nov 2025)
Semantic-functional event-triggered communication: (Silva et al., 2023)
Semantic communication for path planning: (Zhao et al., 8 Oct 2025)
Goal-based pheromone/message protocols: (White et al., 2019)

The integration of importance-map-guided compression, end-to-end semantic coding, error-aware physical-layer adaptation, and modular cooperative fusion forms the technical state of the art in semantic communication for cooperative swarms. These approaches provide a rigorous foundation for robust, scalable, and efficient multi-agent collaboration under practical wireless constraints.