Papers
Topics
Authors
Recent
2000 character limit reached

DSC-UAV: Adaptive Semantic Communication

Updated 11 January 2026
  • DSC-UAV is a digital semantic communication framework that integrates prompt-guided image encoding, learned quantization, and amplify-and-forward UAV relaying for efficient data transmission in bandwidth-constrained environments.
  • It employs TQC reinforcement learning to jointly optimize UAV trajectories and resource allocation, minimizing Age of Information while maximizing semantic-structural similarity.
  • Empirical evaluations demonstrate up to 22% SSS improvement and 14% AoI reduction over traditional methods, underscoring its potential in mission-critical applications.

The DSC-UAV model specifies a context-adaptive Digital Semantic Communication framework designed for Unmanned Aerial Vehicle (UAV) networks, targeting efficient, mission-centric data transfer in bandwidth-constrained environments such as smart city surveillance. The system combines prompt-guided semantic image encoding, digital quantization, amplify-and-forward (AF) UAV relaying, and user mobility-aware resource optimization via Truncated Quantile Critic (TQC) reinforcement learning. Its goal is to minimize Age of Information (AoI) and maximize semantic-structural similarity (SSS) in multi-user scenarios, explicitly outperforming both traditional digital and existing semantic communication strategies (Joshi et al., 4 Jan 2026).

1. System Architecture and Data Flow

The DSC-UAV architecture comprises ground users (GUs) equipped with semantic transmitters, a UAV relay fleet, and a centralized processing server.

  • Semantic Transmitter: Each GU generates an image Im(k)I_m(k) and an associated prompt TmT_m per transmission event. Prompt-aware semantic encoding is performed by a Vision Transformer (ViT) jointly conditioned on the prompt via a CLIP-based text encoder.
  • Quantization and Digital Mapping: Extracted semantic features sm(k)s_m(k) are discretized by a self-attention-based quantizer into codewords Zm(k)Z_m(k), then mapped to OFDM symbols with IDFT.
  • UAV Relaying: NN UAV nodes provide parallel amplify-and-forward (AF) relaying of encoded signals over orthogonal subcarriers.
  • Centralized Semantic Decoding and Mobility Controller: The server reconstructs the target image via a prompt-aware CNN decoder and manages user/UAV mobility using TQC-based reinforcement learning for joint trajectory and resource allocation.

The data pipeline can be represented as:

GU m: [Im(k), Tm]ViT+Text Encodersm(k)Q()Zm(k)IDFTxm(k)\text{GU } m:~ [I_m(k),~ T_m] \rightarrow \text{ViT+Text Encoder} \rightarrow s_m(k) \rightarrow Q(\cdot) \rightarrow Z_m(k) \rightarrow \text{IDFT} \rightarrow x_m(k)

Data are then relayed by UAVs and reconstructed centrally (Joshi et al., 4 Jan 2026).

2. Prompt-Aware Semantic Encoding

Semantic encoding is realized by integrating prompt-text tokens using a modified ViT backbone.

  • Patch Embedding: The image I{0,,255}C×H×WI \in \{0,\ldots,255\}^{C \times H \times W} is partitioned into patches of 2d×2d2^d \times 2^d pixels, where dd is the compression ratio (e.g., d[1,4]d\in[1,4]).
  • CLIP-Conditioned Processing: Textual prompt tokens TRL×CT \in \mathbb{R}^{L \times C} are incorporated at every transformer stage.
  • Cross-Attention Injection: At block bb, output is updated as:

X(b+13)=X(b)+MSA(LN(X(b))), X(b+23)=X(b+13)+gbCrossAttn(LN(X(b+13));K=T,V=T), X(b+1)=X(b+23)+MLP(LN(X(b+23)))X^{(b+\frac{1}{3})} = X^{(b)} + \mathrm{MSA}(\mathrm{LN}(X^{(b)})), \ X^{(b+\frac{2}{3})} = X^{(b+\frac{1}{3})} + g_b \cdot \mathrm{CrossAttn}(\mathrm{LN}(X^{(b+\frac{1}{3})}); K=T, V=T), \ X^{(b+1)} = X^{(b+\frac{2}{3})} + \mathrm{MLP}(\mathrm{LN}(X^{(b+\frac{2}{3})}))

where gbg_b is a learnable prompt-adaptive gate.

  • Loss and Similarity: Training targets multi-scale MS-SSIM loss, with the system objective combining semantic cosine similarity and MS-SSIM as:

SSS=αsCosSim(s,s^)+(1αs)MSSSIM(I,I^)\mathrm{SSS} = \alpha_s \cdot \mathrm{CosSim}(s, \hat s) + (1-\alpha_s)\cdot\mathrm{MS-SSIM}(I, \hat{I})

This architecture enables prompt-guided abstraction ranging from generic to object-centric semantics (Joshi et al., 4 Jan 2026).

3. Digital Quantization and Channel Transmission

Semantic features undergo learned quantization and digital transmission.

  • Soft-to-Hard Quantization: Transformer outputs ss are quantized to codewords Z=Q(s){1,,K}C×(HW/22d)Z = Q(s) \in \{1, \dots, K\}^{C \times (HW/2^{2d})} by a self-attention soft encoder (Gumbel-softmax in training, deterministic nearest codebook selection in inference).
  • Bit Payload: Each codeword uses b=log2Kb = \lceil \log_2 K \rceil bits; total semantic payload is Dsem=bC(HW/22d)D_{sem} = b \cdot C \cdot (HW/2^{2d}).
  • OFDM Mapping: Quantized symbols are transformed with IDFT for time-domain transmission over OFDM subcarriers.
  • Channel Coding: The model is compatible with standard digital block codes (e.g., LDPC, polar). Simulations employed a CRC-aided BCH code at 1/2 rate.
  • Relay Protocol: All UAVs act as parallel AF relays over orthogonal subcarriers.

Channel is modeled as Nakagami-mm (m=2m=2) fading at 2.4 GHz with $10$ MHz uplink, with UAV transmit power $200$ mW and noise floor 105-105 dBm (Joshi et al., 4 Jan 2026).

4. Joint UAV Trajectory and Resource Optimization

Resource allocation and trajectory control are formulated as a reinforcement learning problem.

  • System Utility:

J=ω1SSSω2Δ,Δ=1Kk=1K1Mm=1MΔm(k)J = \omega_1 \mathrm{SSS} - \omega_2 \overline{\Delta}, \qquad \overline{\Delta} = \frac{1}{K} \sum_{k=1}^K \frac{1}{M} \sum_{m=1}^M \Delta_m(k)

where Δm(k)\Delta_m(k) is the Age of Information for user mm and event kk.

  • Constraints:
    • UAV collision avoidance: sn(t)sn(t)2Dmin\|s_n(t) - s_{n'}(t)\|_2 \geq D_{min}
    • Energy budgets: k=1K[EnComm(k)+EnSt(k)]Emax\sum_{k=1}^K [E_n^{Comm}(k)+E_n^{St}(k)] \leq E_{max}
    • AoI upper bound: Δm(k)1/λm\Delta_m(k) \leq 1/\lambda_m
    • UAV mobility: vn(t+1)vn(t)2Vmax\|v_n(t+1) - v_n(t)\|_2 \leq V_{max}
  • Optimization Variables: UAV trajectories sn(t)s_n(t), resource splits ρ\rho, and encoder compression dd.

The above formulation supports dynamic user mobility, adaptively steering UAV relays for both surveillance coverage and bandwidth efficiency (Joshi et al., 4 Jan 2026).

5. Truncated Quantile Critic (TQC) Reinforcement Learning

The optimization utilizes TQC, a distributional RL algorithm enhancing stability for continuous control tasks.

  • MDP Specification:
    • State: UAV/GU positions, velocities, bitloads, channel gains, and energy.
    • Action: (movement direction, distance), compression factor dd, resource splits {ρmn(k)}\{\rho_m^n(k)\}.
    • Reward: Rsys=βminm,kSSSm,kΔR_{sys} = \beta \cdot \min_{m,k} \mathrm{SSS}_{m,k} - \overline{\Delta}, penalized for collisions, energy violations, or deadline misses.
  • TQC Updates:

    • KK critics, each with NN quantile heads Zj,iZ_{j,i}, aggregate target as the average of bottom KNdtruncKN - d_{trunc} sorted target quantiles, reducing overestimation bias.
    • Critic loss:

    LQj(ψj)=E[i=1NHuberκ(Zj,i(s,a;ψj)yt)]\mathcal{L}_{Q_j}(\psi_j) = \mathbb{E} \left[ \sum_{i=1}^N \mathrm{Huber}_\kappa \left( Z_{j,i}(s,a;\psi_j) - y_t \right) \right] - Actor loss:

    Lπ(ϕ)=EsD[αlogπ(as)TruncQValue(s,a)]\mathcal{L}_\pi(\phi) = \mathbb{E}_{s \sim D}[ \alpha \log \pi(a|s) - \mathrm{TruncQValue}(s,a)] - Target networks are updated by Polyak averaging.

TQC demonstrably yields 10–15% lower AoI and 5–8% higher SSS compared to Soft Actor-Critic (SAC) and TD3 baselines, attributed to its distributional critic design and joint quantile truncation (Joshi et al., 4 Jan 2026).

6. Performance Metrics and Comparative Evaluation

The DSC-UAV framework is evaluated on Age of Information (AoI) and minimum semantic-structural similarity (SSS).

SNR (dB) DSC+TQC (AoI, SSS) D+TQC SC+TQC DSC+SAC DSC+TD3
0 4.7, 0.76 5.3, 0.64 5.6, 0.72 5.0, 0.70 5.1, 0.71
5 4.0, 0.83 4.8, 0.69 5.1, 0.78 4.4, 0.76 4.5, 0.77
10 3.4, 0.91 3.9, 0.75 4.1, 0.88 3.7, 0.85 3.6, 0.87
15 3.1, 0.93 3.6, 0.78 3.8, 0.89 3.5, 0.87 3.4, 0.89
20 2.9, 0.94 3.4, 0.80 3.6, 0.90 3.3, 0.89 3.2, 0.90

Empirical results indicate that, between 5–10 dB SNR, DSC+TQC achieves a 14% AoI reduction (from 4.8 s to 4.0 s) and a 22% SSS increase (from 0.69 to 0.83) over digital communication with TQC (D+TQC). Gains over semantic-only (SC+TQC) and other reinforcement learning baselines (DSC+SAC, DSC+TD3) are also consistently observed (Joshi et al., 4 Jan 2026).

7. Technical Significance and Research Implications

The DSC-UAV model integrates context-driven prompt injection, digital semantic quantization, and reinforcement learning for unified resource optimization in airborne relaying scenarios. Its ability to extract, transmit, and reconstruct semantically relevant features under bandwidth constraints targets the requirements of latency-sensitive, information-centric surveillance. The use of TQC reinforcement learning establishes robust control even in the presence of user mobility and dynamic wireless channels. Empirical and methodological details, including model architectures, objective formulations, hyperparameter settings, and simulation benchmarks, fully support replication and further research into adaptive semantic communication for UAV networks (Joshi et al., 4 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to DSC-UAV Model.