Adapt Tokenization and Patching for Wireless Time-Series in Physical-Layer Foundation Models

Develop tokenization and patching strategies for physical-layer wireless time-series data (for example, IQ samples, Channel State Information, chirps, and FFT-derived statistics) that can robustly accommodate varying entropy, sampling rates, and sequence lengths across diverse wireless technologies and use cases, and that yield a consistent common embedding space suitable for transformer-based wireless foundation models.

Background

The paper envisions a wireless physical-layer foundation model trained with self-supervised objectives, which requires transforming heterogeneous wireless time-series into a common embedding or tokenization space. Unlike NLP, wireless signals exhibit highly variable entropy and representations across technologies (e.g., LTE, 5G-NR, Wi‑Fi) and tasks, complicating patch size selection and tokenization.

Existing tokenization methods for general time-series have been explored, but the paper notes a lack of strategies tailored to wireless time-series, where differences in sampling rates, sequence lengths, and data variability are pronounced. A robust solution is needed to enable unified transformer-based modeling across telecom use cases.

References

Under the dynamic conditions of telecom use cases, it remains an open challenge how to effectively adapt these techniques due to differences in entropy and data representation.

— Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences (2503.04184 - Shahid et al., 6 Mar 2025) in Section 13.1.13, LTM pre-training of a physical-layer foundation model

Adapt Tokenization and Patching for Wireless Time-Series in Physical-Layer Foundation Models

Background

References

Related Problems