Token-Domain Multiple Access (ToDMA)
- Token-Domain Multiple Access (ToDMA) is a semantic communications framework that encodes data into token sequences using shared pre-trained tokenization and modulation codebooks.
- It achieves high transmission efficiency with low end-to-end latency, demonstrating near-orthogonal performance metrics such as PSNR and BERTScore in text and image modalities.
- The framework employs compressed sensing, channel clustering, and masked token prediction via multimodal language models to effectively resolve over 90% of token collisions.
Token-Domain Multiple Access (ToDMA) is a large-model-driven grant-free multiple access framework for semantic communications in which a massive set of devices encode, transmit, and recover information in the token domain. ToDMA leverages pre-trained tokenization and modulation codebooks, compressed sensing for token detection, channel state information (CSI) clustering for user separation, and context-aware masked token prediction using multimodal LLMs (MLLMs) to resolve token collisions. This pipeline achieves high transmission efficiency and low end-to-end latency, with semantic-level quality superior to both orthogonal and context-unaware non-orthogonal communication schemes, across text and image modalities (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).
1. Architecture and Signal Processing Pipeline
ToDMA defines a unified transmitter and receiver pipeline implemented as follows:
- Tokenization (Source Coding): Each device encodes its source data (such as an image patch or text sequence) into a sequence of tokens via a shared learned tokenizer,
where each indexes a common token codebook of size . Tokens are equivalently represented as one-hot vectors (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).
- Modulation (Channel Coding): Token indices are mapped by a shared complex-valued modulation codebook ,
so each device transmits over time slots the codeword matrix . The orthonormality is enforced for efficient projection and separation (Qiao et al., 10 Feb 2025).
- Channel Model: In a grant-free uplink, active devices simultaneously transmit to an -antenna base station (BS). The received signal at slot is
where is the (slot-invariant) channel of device and is i.i.d. additive white Gaussian noise (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).
2. Receiver Design: Token Detection and Channel Clustering
The base station implements a multi-step recovery procedure:
- Compressed Sensing Token Detection: For each time slot, the receiver projects onto columns of , forming
where is row-sparse, encoding which tokens are active. Approximate Message Passing (AMP) is employed to detect the active token set and estimate their CSI vectors (Qiao et al., 16 May 2025).
- Channel Clustering for Source Assignment: Each device's channel remains constant over slots. For robust token-to-user assignment, the receiver clusters the estimated CSI vectors into groups using K-means++,
Each detected token is then assigned to the closest cluster center (user), yielding a set of partially reconstructed token sequences per device (Qiao et al., 16 May 2025).
3. Semantic Orthogonality and Collision Mitigation
- Semantic Orthogonality: Token sequences from different devices occupy distant regions in a learned embedding space, measured via low average cosine similarity or large distributional divergence:
This separation enables disentangling of overlapping transmissions even when tokens collide (Qiao et al., 10 Feb 2025).
- Collision Handling via Masked Prediction: When multiple devices choose the same token in the same slot, clustering cannot resolve assignments, and corresponding tokens are masked (). Each device's partial token sequence is completed using a pre-trained multimodal LLM (MLLM), e.g., BERT (text) or MaskGIT (image), by predicting candidate tokens in context, restricted to the detected collision set :
This targeted search reduces masked token prediction complexity from to (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).
4. Bayesian Interpretation and Receiver Inference
Maximum a posteriori (MAP) inference for the joint set of all tokens transmitted by all devices is formally intractable: ToDMA operationalizes an approximate MAP solution by chaining:
- Matched-filter token detection and energy thresholding;
- Nearest-CSI channel assignment (clustering);
- Contextual masked token re-prediction, as a conditional MAP estimate restricted to detected collision candidates, with the transformer acting as a probabilistic scorer (Qiao et al., 10 Feb 2025).
5. Practical Implementation and Complexity
A single shared tokenizer and modulation codebook ensure coherent symbol mapping and efficient grant-free random access across a large device set. Transformer-based tokenizers provide context-sensitive token quantization, increasing semantic robustness (Qiao et al., 10 Feb 2025, Qiao et al., 16 May 2025).
Per-slot receiver complexity is determined by:
| Operation | Complexity | Dominant Factors |
|---|---|---|
| Token Detection | Matrix projections | |
| Assignment | Channel clustering; | |
| Masked Prediction | (per sample) | Transformer embedding dimension ; reduced by candidate restriction |
The candidate set restriction for masked prediction permits the use of large token vocabularies () and long sequences efficiently (Qiao et al., 10 Feb 2025).
Recommended system design principles include:
- BS antennas for under-determined least squares separation;
- Codebook orthonormality and transformer depth (8–12 layers) for effective masking;
- Thresholds () calibrated for detection/assignment accuracy (Qiao et al., 10 Feb 2025).
6. Performance Evaluation
Simulation studies in image (ImageNet-100, VQ-GAN, MaskGIT) and text (QUOTES500K, BERT) transmission scenarios confirm the following performance characteristics (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025):
- Token Error Rate (TER): ToDMA maintains TER near those of error-free orthogonal QAM (Orth-Com) for increasing active device count , while context-unaware non-orthogonal schemes degrade rapidly with collisions.
- Semantic Quality: ToDMA's PSNR in images remains within 1–2 dB of Orth-Com up to (Orth-Com at PSNR ≈ 32 dB for ); text BERTScore ≈ 0.92 at (vs 0.78 for non-contextual baseline). Visual reconstructions are nearly indistinguishable from the orthogonal scheme (Qiao et al., 16 May 2025).
- Latency: ToDMA achieves lower end-to-end latency relative to Orth-Com at typical BER targets, due to non-orthogonal, grant-free access and minimal pre-transmission coordination.
- Collision Recovery: MLLM-based masked token prediction successfully resolves >90% of token collisions, as shown in token-map heatmaps and output quality (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).
7. Insights, Limitations, and Design Implications
Key properties of ToDMA, supported by empirical results and theoretical design, include:
- Exploitation of semantic orthogonality enables disentanglement of overlapping codewords at the receiver, leveraging the low inter-sequence similarity in embedding space.
- Joint source-channel coding is realized in the token domain, bypassing the inefficiency of separate bit-wise coding.
- Grant-free operation with global codebook/tokenizer coordination dramatically reduces signaling overhead and scales to large device populations.
- Leveraging masked tokens and MLLMs for context-driven completion enables robust resolution of collisions with tractable inference complexity.
- A plausible implication is that future ToDMA extensions could adaptively tune codebook size, sequence length, and transformer depth to modality and application requirements.
ToDMA currently assumes shared tokenization/modulation infrastructure and well-calibrated model/antenna configurations; deviations might affect separation or completion accuracy. The necessity of large, pre-trained context models (e.g., MaskGIT, BERT) implicates scalability constraints in resource-limited settings. These considerations delineate active research directions within semantic communications leveraging token-domain access mechanisms (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).