Multi-Modal Beamforming with Model Compression and Modality Generation for V2X Networks (2506.22469v1)
Abstract: Integrating sensing and communication (ISAC) has emerged as a cornerstone technology for predictive beamforming in 6G-enabled vehicle-to-everything (V2X) networks. However, existing ISAC paradigms rely solely on radio frequency (RF) signal, limiting sensing resolution and robustness in V2X environments with high mobility and multipath interference. Fortunately, the widespread deployment of diverse non-RF sensors such as cameras and LiDAR, along with the integration of AI and communication systems, offers new opportunities to improve the synergy between sensing and communication. Motivated by this, this work develops a novel and robust communication framework that leverages multi-modal sensing data and advanced AI technologies to assist beamforming in dynamic and realistic vehicular scenarios. Specifically, we propose a multi-modal learning framework for predictive beamforming that integrates modality-specific branches and employs hierarchical Transformer to capture cross-modal features. By exploiting the intrinsic correlation between multi-modal sensing data and beamforming decisions, this design enhances the accuracy and robustness of beamforming in dynamic V2X scenarios. To enable practical deployment on resource-constrained edge device (i.e., the roadside unit), we then develop a module-aware compression strategy that significantly reduces inference latency while preserving model performance. Furthermore, to address potential modality missing in real-world scenarios, we introduce a generative model that is able to reconstruct missing inputs from available observations, allowing the framework to operate reliably even under incomplete sensing conditions. Extensive simulation results conducted on real-world datasets demonstrate that the proposed scheme consistently outperforms existing baselines across various metrics.