Optimally Bridging Semantics and Data: Generative Semantic Communication via Schrödinger Bridge

Published 20 Apr 2026 in eess.IV and cs.CV | (2604.17802v1)

Abstract: Generative Semantic Communication (GSC) is a promising solution for image transmission over narrow-band and high-noise channels. However, existing GSC methods rely on long, indirect transport trajectories from a Gaussian to an image distribution guided by semantics, causing severe hallucination and high computational cost. To address this, we propose a general framework named Schrödinger Bridge-based GSC (SBGSC). By leveraging the Schrödinger Bridge (SB) to construct optimal transport trajectories between arbitrary distributions, SBGSC breaks Gaussian limitations and enables direct generative decoding from semantics to images. Within this framework, we design Diffusion SB-based GSC (DSBGSC). DSBGSC reconstructs the nonlinear drift term of diffusion models using Schrödinger potentials, achieving direct optimal distribution transport to reduce hallucinations and computational overhead. To further accelerate generation, we propose a self-consistency-based objective guiding the model to learn a nonlinear velocity field pointing directly toward the image, bypassing Markovian noise prediction to significantly reduce sampling steps. Simulation results demonstrate that DSBGSC outperforms state-of-the-art GSC methods, improving FID by at least 38% and SSIM by 49.3%, while accelerating inference speed by over 8 times.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper proposes Schrödinger Bridge-based Generative Semantic Communication (SBGSC), which directly establishes an optimal transport trajectory from noisy semantic distributions to target data distributions, bypassing conventional Gaussian priors.
SBGSC significantly outperforms existing conditional diffusion model-based methods, achieving superior semantic perception quality with at least 38% FID improvement and 49.3% SSIM improvement at low CBRs due to direct semantic feature utilization and optimal transport.
The framework successfully suppresses generative hallucinations and reduces computational overhead by over 8 times thanks to a self-consistency-based objective that enables accelerated sampling with fewer Neural Function Evaluations (NFEs).

Optimally Bridging Semantics and Data: Generative Semantic Communication via Schrödinger Bridge

Generative Semantic Communication (GSC) is an emergent paradigm for image transmission over challenging narrow-band and high-noise channels. Existing GSC approaches, primarily based on conditional diffusion models (CDMs), often confront limitations stemming from their indirect generation process. These methods typically initiate generation from an independent Gaussian noise distribution and attempt to recover the target image distribution by leveraging conditional semantic guidance. This indirect transport mechanism leads to significant challenges, including semantic condition mismatch, high computational overhead due to numerous iterative denoising steps, and the pervasive issue of generative hallucinations, wherein the model synthesizes non-existent or erroneous details (Figure 1).

Figure 1: Comparison between existing conditional diffusion model-based generative semantic communication and the proposed SchrÃ¶dinger Bridge-based generative semantic communication (SBGSC).

To overcome these inherent inefficiencies, a novel framework, Schrödinger Bridge-based GSC (SBGSC), is proposed. SBGSC fundamentally redefines the generative decoding process by directly establishing an optimal transport trajectory from the received noisy semantic distribution to the target data distribution, bypassing the conventional Gaussian prior. This direct approach, grounded in Schrödinger Bridge (SB) theory, offers distinct advantages. Firstly, SBGSC leverages abstract semantic features directly, eliminating the need for rigid intermediate modalities (e.g., text or edge maps) and thereby mitigating hallucination risks caused by modality bias. Secondly, by employing semantics as the direct starting state for generation, the optimal transport path inherently constrains the trajectory to closely follow the real data manifold, preserving structural consistency and preventing randomized hallucinations. Thirdly, this direct path significantly reduces the number of sampling steps required, leading to substantial reductions in computational cost and inference latency.

SBGSC Framework and Theoretical Underpinnings

The SBGSC system is architecturally an end-to-end deep learning communication framework designed for high-fidelity semantic transmission across adverse channel environments. It comprises a joint source-channel semantic encoder, a wireless physical channel, and a core SB-based generative decoder (Figure 2). The encoder, parameterized by $\phi$ , maps high-dimensional input data $\mathbf{x}$ into robust semantic features $\mathbf{s}$ . These features are optimally compressed to meet channel bandwidth ratio (CBR) constraints and are transmitted over the wireless channel, which may include Additive White Gaussian Noise (AWGN) and fading effects. The received corrupted semantics, $\hat{\mathbf{s}}$ , are then fed into the SB-based generative decoder, parameterized by $\theta$ , to reconstruct the original data $\hat{\mathbf{x}}$ .

Figure 2: Schematic diagram of the proposed SBGSC framework. The framework mainly consists of a joint source-channel semantic encoder, a wireless physical channel, and a core SB-based generative decoder.

A rigorous theoretical analysis from an information-theoretic standpoint demonstrates the superiority of SBGSC over mainstream CDM-based GSC methods. Lemma 1 establishes an information capacity inequality, showing that unconstrained Joint Source-Channel Coding (JSCC) semantic representations retain provably no less mutual information about the source than modality-constrained condition signals, thereby formalizing the information bottleneck imposed by predefined modality priors. This implies that modality constraints limit the semantic encoder's ability to fully capture task-relevant information.

Further, under a distributional distance assumption (Assumption 1), which posits that the received semantic features retain more principal informative structure of the source data than an uninformative Gaussian prior when projected into image space, Theorem 1 proves that the Schrödinger Bridge transport in SBGSC achieves strictly lower Path Kinetic Energy (PKE). PKE quantifies the cumulative magnitude of the drift field along the generative trajectory, with smaller values indicating shorter, smoother paths requiring fewer sampling iterations. This advantage arises from both a source-informed initialization and a variationally optimal transport path.

These theoretical insights translate directly into communication performance guarantees. Corollary 1 indicates that SBGSC attains a strictly lower semantic hallucination rate with higher end-to-end mutual information, providing an information-theoretic justification for the observed hallucination suppression. Corollary 2 further demonstrates that SBGSC requires strictly fewer neural function evaluations (NFEs) for any prescribed generation accuracy tolerance, directly addressing the computational overhead inherent in existing CDM-based GSC systems.

Diffusion Schrödinger Bridge-based Generative Semantic Communication (DSBGSC)

A specific implementation within the SBGSC framework is Diffusion SB-based Generative Semantic Communication (DSBGSC). DSBGSC comprises a semantic encoder, channel modulation, channel demodulation, and a DSB-based decoder (Figure 3). The transmitter employs a Swin Transformer-based architecture for the semantic encoder, effectively capturing local and global semantic dependencies while performing SNR adaptation and rate control. The decoder leverages DSB to bridge low-dimensional semantic features with high-dimensional images. This is achieved by reformulating the backward SDE of the SB, adopting a zero-drift assumption, which simplifies the bridge posterior to an analytical Gaussian form. This allows for a direct reparameterization trick that expresses intermediate states as a linear interpolation between source data and semantic features, perturbed by bridge process uncertainty.

Figure 3: The overall architecture of proposed DSBGSC.The optimal transfer from semantic distribution to data distribution is achieved via the least action principle of DSB. Meanwhile, the self-consistency property is incorporated, enabling any intermediate state to directly map to the source image for efficient few-step semantic perception.

A key innovation in DSBGSC is the integration of a self-consistency-based objective, guiding the model to learn a nonlinear velocity field. This field explicitly points directly towards the target image, bypassing the traditional stepwise Markovian noise prediction. During training, a neural network is employed to predict the normalized direction from the original source data to the current state, ensuring that the source data can be implicitly recovered at any time step. This optimal prediction direction represents the shortest tangent vector along the SB optimal transport geodesic. During inference, this self-consistency property enables an accelerated sampling strategy, akin to Consistency Models, by iteratively refining the estimate for the original data directly from the decoder output and sampling the next state from the bridge posterior distribution.

Experimental Validation and Results

Experiments conducted across varying Channel Bandwidth Ratios (CBRs) and Signal-to-Noise Ratios (SNRs) on a subset of the ImageNet-1K dataset (for training) and the DIV2K validation set (for testing) demonstrate that DSBGSC significantly outperforms state-of-the-art DeepJSCC and conditional GSC baselines. Under an AWGN channel at SNR = 7dB, DSBGSC exhibits superior semantic perception quality across various CBRs (Figure 4, Figure 5). Specifically, at low CBRs (e.g., 1/192), DSBGSC achieves the best performance across distortion metrics (PSNR, MS-SSIM) and perceptual metrics (LPIPS, FID), with FID improving by at least 38% and SSIM by 49.3%. Visual comparisons clearly show that DSBGSC maintains remarkable semantic fidelity, with discernible object structures and textures even at extremely low bandwidths, where other methods suffer from severe degradation or structural collapse.

Figure 4: Visual comparison of semantic perception quality among different methods under AWGN channel at SNR = 7dB.

Figure 5: Comparison of semantic perception quality among different methods at SNR=7dB when CBR varies.

When evaluating robustness against channel noise at a fixed CBR of 1/48, DSBGSC consistently achieves the best overall performance, demonstrating excellent degradation characteristics (Figure 6, Figure 7). Traditional methods exhibit a "cliff effect," failing completely at low SNRs, while existing generative semantic communication methods often misinterpret channel noise as conditional inputs, leading to severe hallucinations. DSBGSC maintains clear structural and semantic fidelity, even under extremely harsh conditions such as -10 dB SNR.

Figure 6: Visual comparison of semantic perception quality among different methods under AWGN channel with CBR = 1/48.

Figure 7: CBR set to 1/192, semantic perception quality comparison among methods when SNR varies.

A crucial aspect of DSBGSC's performance is its effective hallucination suppression (Figure 8). Unlike other conditional generative methods that are highly susceptible to impaired features, DSBGSC's direct gradient guidance from semantics to data distribution significantly mitigates the generation of fabricated or erroneous objects.

Figure 8: Visual comparison of hallucination suppression. Red boxes highlight severe semantic hallucinations generated by BriGSC.

Furthermore, DSBGSC demonstrates significant efficiency gains. The generative process achieves high semantic perception quality with a dramatically reduced number of Neural Function Evaluations (NFEs), offering an inference speedup of over 8 times compared to some baselines (Figure 9, Table 1). Even with minimal iterations (e.g., NFE=10), DSBGSC provides high-quality image reconstruction, with subsequent iterations primarily refining texture details.

Figure 9: Generative processes for semantic and data distribution transfer with NFE=10. Each figure depicts the direct prediction performance of ${\mathbf{x}_0}$ at the current state during the generative evolution.

The robustness of DSBGSC extends to fading channels, where the performance gap with AWGN channels narrows with increasing SNR (Figure 10). The optimal transmission and self-consistency characteristics of the SB enable adaptive fitting of nonlinear distortions, mitigating semantic information degradation and achieving stable convergence even in deep fading scenarios.

Figure 10: Comparison of semantic perceptual quality under different channels.

Conclusion

The SBGSC framework, particularly its DSBGSC implementation, represents a substantial advancement in generative semantic communication for narrow-band and high-noise channels. By leveraging Schrödinger Bridge theory, it establishes an optimal, direct transport between semantic and data distributions, overcoming the limitations of conventional Gaussian-prior-based methods. This approach fundamentally reduces computational overhead and suppresses generative hallucinations, leading to superior perceptual quality and enhanced robustness. The theoretical and empirical benefits of SBGSC highlight its potential to improve high-fidelity image transmission in resource-constrained environments. Future research will explore constructing a bidirectional SB architecture to unify semantic encoding and decoding, aiming to approach the theoretical limits of semantic communication.

Markdown Report Issue