Token-Domain Multiple Access (ToDMA)

Updated 27 February 2026

Token-Domain Multiple Access (ToDMA) is a semantic communications framework that encodes data into token sequences using shared pre-trained tokenization and modulation codebooks.
It achieves high transmission efficiency with low end-to-end latency, demonstrating near-orthogonal performance metrics such as PSNR and BERTScore in text and image modalities.
The framework employs compressed sensing, channel clustering, and masked token prediction via multimodal language models to effectively resolve over 90% of token collisions.

Token-Domain Multiple Access (ToDMA) is a large-model-driven grant-free multiple access framework for semantic communications in which a massive set of devices encode, transmit, and recover information in the token domain. ToDMA leverages pre-trained tokenization and modulation codebooks, compressed sensing for token detection, channel state information (CSI) clustering for user separation, and context-aware masked token prediction using multimodal LLMs (MLLMs) to resolve token collisions. This pipeline achieves high transmission efficiency and low end-to-end latency, with semantic-level quality superior to both orthogonal and context-unaware non-orthogonal communication schemes, across text and image modalities (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).

1. Architecture and Signal Processing Pipeline

ToDMA defines a unified transmitter and receiver pipeline implemented as follows:

Tokenization (Source Coding): Each device encodes its source data (such as an image patch or text sequence) $s$ into a sequence of tokens via a shared learned tokenizer,

$T: s \mapsto (t_1, \dots, t_N)$

where each $t_n \in \{1, \dots, Q\}$ indexes a common token codebook of size $Q$ . Tokens are equivalently represented as one-hot vectors $\mathbf{b}_n \in \{0,1\}^Q$ (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).

Modulation (Channel Coding): Token indices are mapped by a shared complex-valued modulation codebook $\mathbf{U} \in \mathbb{C}^{L \times Q}$ ,

$m: t_n \mapsto \mathbf{x}_n = \mathbf{U}\mathbf{b}_n \in \mathbb{C}^L$

so each device transmits over $N$ time slots the codeword matrix $\mathbf{X} = [\mathbf{x}_1, \ldots, \mathbf{x}_N] \in \mathbb{C}^{L \times N}$ . The orthonormality $U^HU = I_Q$ is enforced for efficient projection and separation (Qiao et al., 10 Feb 2025).

Channel Model: In a grant-free uplink, $K \ll K_T$ active devices simultaneously transmit to an $M$ -antenna base station (BS). The received signal at slot $n$ is

$\mathbf{Y}_n = \sum_{k=1}^K \mathbf{x}_{k,n} \mathbf{h}_k^T + \mathbf{Z}_n = \mathbf{U}\left( \sum_{k=1}^K \mathbf{b}_{k,n} \mathbf{h}_k^T \right) + \mathbf{Z}_n$

where $\mathbf{h}_k \in \mathbb{C}^M$ is the (slot-invariant) channel of device $k$ and $\mathbf{Z}_n$ is i.i.d. additive white Gaussian noise (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).

2. Receiver Design: Token Detection and Channel Clustering

The base station implements a multi-step recovery procedure:

Compressed Sensing Token Detection: For each time slot, the receiver projects $\mathbf{Y}_n$ onto columns of $\mathbf{U}$ , forming

$\hat{\mathbf{H}}_n = \mathbf{U}^H \mathbf{Y}_n = \mathbf{H}_n + \mathbf{U}^H \mathbf{Z}_n$

where $\mathbf{H}_n \in \mathbb{C}^{Q \times M}$ is row-sparse, encoding which tokens are active. Approximate Message Passing (AMP) is employed to detect the active token set $\hat{\mathcal{P}}_n$ and estimate their CSI vectors $\{\hat{\mathbf{h}}_{\phi, n}\}$ (Qiao et al., 16 May 2025).

Channel Clustering for Source Assignment: Each device's channel remains constant over $N$ slots. For robust token-to-user assignment, the receiver clusters the estimated CSI vectors $\widehat{\mathcal{F}} = \{\hat{\mathbf{h}}_{\phi, n}\}$ into $K$ groups using K-means++,

$\min_{\{\mathcal{C}_k\}} \sum_{k=1}^K \sum_{\mathbf{h} \in \mathcal{C}_k} \|\mathbf{h} - \mathbf{c}_k\|_2^2$

Each detected token is then assigned to the closest cluster center (user), yielding a set of partially reconstructed token sequences per device (Qiao et al., 16 May 2025).

3. Semantic Orthogonality and Collision Mitigation

Semantic Orthogonality: Token sequences from different devices occupy distant regions in a learned embedding space, measured via low average cosine similarity or large distributional divergence:

$\text{sim}(B_i,B_j) = \frac{1}{N}\sum_{n=1}^N \frac{\langle E_{t_{i,n}}, E_{t_{j,n}} \rangle}{\|E_{t_{i,n}}\| \cdot \|E_{t_{j,n}}\|}$

This separation enables disentangling of overlapping transmissions even when tokens collide (Qiao et al., 10 Feb 2025).

Collision Handling via Masked Prediction: When multiple devices choose the same token in the same slot, clustering cannot resolve assignments, and corresponding tokens are masked ( $[{\rm MASK}]$ ). Each device's partial token sequence is completed using a pre-trained multimodal LLM (MLLM), e.g., BERT (text) or MaskGIT (image), by predicting candidate tokens in context, restricted to the detected collision set $\widetilde{\mathcal{P}}_n$ :

$\hat{t}_m = \arg\max_{q \in \widetilde{\mathcal{P}}_n} P(t_m = q \mid \text{context})$

This targeted search reduces masked token prediction complexity from $O(Q)$ to $O(|\widetilde{\mathcal{P}}_n|)$ (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).

4. Bayesian Interpretation and Receiver Inference

Maximum a posteriori (MAP) inference for the joint set of all tokens transmitted by all devices is formally intractable: $\{\hat{t}_{k,n}\} = \arg\max_{\{t_{k,n}\}} P(\{t_{k,n}\} | \{\mathbf{Y}_n\}_{n=1}^N, \mathbf{U}, \{\mathbf{h}_k\})$ ToDMA operationalizes an approximate MAP solution by chaining:

Matched-filter token detection and energy thresholding;
Nearest-CSI channel assignment (clustering);
Contextual masked token re-prediction, as a conditional MAP estimate restricted to detected collision candidates, with the transformer acting as a probabilistic scorer (Qiao et al., 10 Feb 2025).

5. Practical Implementation and Complexity

A single shared tokenizer and modulation codebook ensure coherent symbol mapping and efficient grant-free random access across a large device set. Transformer-based tokenizers provide context-sensitive token quantization, increasing semantic robustness (Qiao et al., 10 Feb 2025, Qiao et al., 16 May 2025).

Per-slot receiver complexity is determined by:

Operation	Complexity	Dominant Factors
Token Detection	$O(NLM + QM)$	Matrix projections
Assignment	$O(K\|\mathcal{P}_n\|M)$	Channel clustering; $\|\mathcal{P}_n\|\approx K$
Masked Prediction	$O(KNd^2)$ (per sample)	Transformer embedding dimension $d$ ; reduced by candidate restriction

The candidate set restriction for masked prediction permits the use of large token vocabularies ( $Q\approx 10^3-10^4$ ) and long sequences efficiently (Qiao et al., 10 Feb 2025).

Recommended system design principles include:

$M \geq Q$ BS antennas for under-determined least squares separation;
Codebook orthonormality and transformer depth (8–12 layers) for effective masking;
Thresholds ( $T_h \approx 2\sigma^2$ ) calibrated for detection/assignment accuracy (Qiao et al., 10 Feb 2025).

6. Performance Evaluation

Simulation studies in image (ImageNet-100, VQ-GAN, MaskGIT) and text (QUOTES500K, BERT) transmission scenarios confirm the following performance characteristics (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025):

Token Error Rate (TER): ToDMA maintains TER near those of error-free orthogonal QAM (Orth-Com) for increasing active device count $K$ , while context-unaware non-orthogonal schemes degrade rapidly with collisions.
Semantic Quality: ToDMA's PSNR in images remains within 1–2 dB of Orth-Com up to $K = 80$ (Orth-Com at PSNR ≈ 32 dB for $K=20$ ); text BERTScore ≈ 0.92 at $K=20$ (vs 0.78 for non-contextual baseline). Visual reconstructions are nearly indistinguishable from the orthogonal scheme (Qiao et al., 16 May 2025).
Latency: ToDMA achieves $\sim4\times$ lower end-to-end latency relative to Orth-Com at typical BER targets, due to non-orthogonal, grant-free access and minimal pre-transmission coordination.
Collision Recovery: MLLM-based masked token prediction successfully resolves >90% of token collisions, as shown in token-map heatmaps and output quality (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).

7. Insights, Limitations, and Design Implications

Key properties of ToDMA, supported by empirical results and theoretical design, include:

Exploitation of semantic orthogonality enables disentanglement of overlapping codewords at the receiver, leveraging the low inter-sequence similarity in embedding space.
Joint source-channel coding is realized in the token domain, bypassing the inefficiency of separate bit-wise coding.
Grant-free operation with global codebook/tokenizer coordination dramatically reduces signaling overhead and scales to large device populations.
Leveraging masked tokens and MLLMs for context-driven completion enables robust resolution of collisions with tractable inference complexity.
A plausible implication is that future ToDMA extensions could adaptively tune codebook size, sequence length, and transformer depth to modality and application requirements.

ToDMA currently assumes shared tokenization/modulation infrastructure and well-calibrated model/antenna configurations; deviations might affect separation or completion accuracy. The necessity of large, pre-trained context models (e.g., MaskGIT, BERT) implicates scalability constraints in resource-limited settings. These considerations delineate active research directions within semantic communications leveraging token-domain access mechanisms (Qiao et al., 16 May 2025, Qiao et al., 10 Feb 2025).

Markdown Report Issue Upgrade to Chat

References (2)

ToDMA: Large Model-Driven Token-Domain Multiple Access for Semantic Communications (2025)

Token-Domain Multiple Access: Exploiting Semantic Orthogonality for Collision Mitigation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Token-Domain Multiple Access (ToDMA).

Token-Domain Multiple Access (ToDMA)

1. Architecture and Signal Processing Pipeline

2. Receiver Design: Token Detection and Channel Clustering

3. Semantic Orthogonality and Collision Mitigation

4. Bayesian Interpretation and Receiver Inference

5. Practical Implementation and Complexity

6. Performance Evaluation

7. Insights, Limitations, and Design Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Token-Domain Multiple Access (ToDMA)

1. Architecture and Signal Processing Pipeline

2. Receiver Design: Token Detection and Channel Clustering

3. Semantic Orthogonality and Collision Mitigation

4. Bayesian Interpretation and Receiver Inference

5. Practical Implementation and Complexity

6. Performance Evaluation

7. Insights, Limitations, and Design Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research