Zero-Shot CSI Lossless Compressors

Updated 17 January 2026

The paper introduces a zero-shot framework that leverages large language models as probability oracles to achieve near-entropy lossless CSI compression without domain-specific fine-tuning.
It presents a hybrid LM–FM architecture that adaptively partitions quantized CSI features between context-aware autoregressive coding and parallel factorized coding.
The framework effectively manages rate-distortion-complexity trade-offs, demonstrating competitive compression rates and improved NMSE performance on both real and simulated datasets.

Zero-shot CSI lossless compressors refer to a class of schemes that exploit LLMs as general-purpose, zero-shot probability oracles for arithmetic coding of quantized channel state information (CSI) features, achieving lossless compression rates close to the entropy bound with no CSI-specific fine-tuning. The TCLNet framework exemplifies this approach, combining an LLM-driven, context-aware coding path with a parallel factorized model under adaptive complexity control, and operationalizing lossless compression through prompt engineering and token sequence manipulation (Yang et al., 10 Jan 2026).

1. Hybrid LM–FM Architecture for Lossless CSI Compression

Lossless CSI compression in TCLNet begins with a quantized latent vector $z' = Q(z) \in \{0, \ldots, 2^{n}-1\}^{z}$ , obtained by quantizing the output $z\in \mathbb{R}^z$ of a lossy encoder with $n$ -bit scalar quantization. The lossless stage integrates two probability models:

Autoregressive LLM (LM): Models the ASCII-tokenized CSI sequence $u = \text{ASCII}(z')$ as an autoregressive process, estimating $P_{l} = \{\rho_{k|<k} : k=1\,..\,z\}$ , where $\rho_{k|<k}(a)=\Pr(\text{token}_k=a\,|\,\text{token}_1\ldots \text{token}_{k-1})$ for each position $k$ and ASCII symbol $a$ .
Factorized Model (FM): Provides a marginal probability distribution $P_f = \{\pi_0, \ldots, \pi_{2^n-1}\}$ , representing $\Pr(\text{symbol}=s)$ , $\sum_s \pi_s=1$ .

A complexity-controlled symbol-selection module assigns each position $k$ to either the LM branch (context-aware) or the FM branch (parallel), partitioning $z'$ into disjoint index sets $z'_l$ and $z'_f$ . Separate arithmetic coding is performed on these subsets using the respective probability models, resulting in bitstreams $b_l$ and $b_f$ .

2. Adaptive Switching Between Context-Aware and Parallel Coding

For each quantized symbol index $k$ , TCLNet calculates:

Context-based entropy:

$H_{l,k} = -\sum_{a \in A} \rho_{k|<k}(a)\log_2 \rho_{k|<k}(a)$

Marginal (FM) entropy:

$H_f = -\sum_{s=0}^{2^n-1} \pi_s \log_2 \pi_s$

Entropy gain:

$D_k = H_f - H_{l,k}$

A user-defined complexity variable $c \in [0, 1]$ determines the proportion of positions assigned to FM coding. After sorting all indices by $D_k$ in descending order, the $(1-c)\,z$ most gainful positions are encoded via the LM, and the remaining $cz$ via the FM. Adjusting $c$ enables fine-tuning of the rate-distortion-complexity (RDC) trade-off:

Rate ( $R$ ): Number of bits per element, $R = (|b_l| + |b_f|)/N$ .
Distortion ( $D$ ): Fixed by the lossy module, measured in NMSE.
Complexity ( $T_{\text{lossless}}$ ):

$T_{\text{lossless}} \simeq \max\{(1-c)zT_{\text{lm}},czT_{\text{fm}}\} \approx (1-c)zT_{\text{lm}}$

since $T_{\text{lm}} \gg T_{\text{fm}}$ .

This mechanism allows trade-off between coding efficiency and computational cost, central to deployment in practical systems.

3. Zero-Shot Prompting and Operational Workflow

The key innovation is using general-purpose LLMs, such as ChatGPT-5, in a zero-shot mode with no domain tuning. The process comprises:

ASCII Tokenization: Each quantized integer ( $[0,2^n-1]$ ) maps bijectively to an ASCII character, producing a token sequence interpretable by the LLM.

Prompt Engineering: The LLM is instructed via an in-context prompt to "act as a probability estimator," providing, for each presented context, a probability vector over the ASCII vocabulary in JSON format.

1
2
3

System: You are given a sequence of ASCII tokens representing quantized CSI coefficients. For each next token, you must output a probability vector over the entire ASCII vocabulary in JSON array format, without additional commentary.
User: Input tokens so far: "Gf#…Z"
LLM: [0.001, 0.0005, …, 0.002]

Compression: At each time $t$ , the current context $u_{<t}$ is submitted along with the prompt to the LLM; the returned vector $p_t$ is passed to the arithmetic encoder for the next token.
Decompression: The arithmetic decoder reconstructs the ASCII sequence by querying the LLM with mirrored context and prompt, synchronizing symbol-by-symbol decoding with the encoder's probability estimates. FM-decoded symbols are merged via the indicator matrix. The ASCII sequence is then detokenized, dequantized as $\hat{z} = z' \Delta + z_\text{min}$ , and forwarded to the lossy CSI decoder.

4. Mathematical Formulation of Compression Modules

The formal framework underpinning zero-shot CSI lossless compression includes:

Quantization:

$\Delta = \frac{z_\text{max} - z_\text{min}}{2^n - 1},\qquad z' = \left\lfloor \frac{z - z_\text{min}}{\Delta} \right\rfloor$

Entropy Measures:

$H_{l,k} = -\sum_a \rho_{k|<k}(a) \log_2 \rho_{k|<k}(a)$

$H_f = -\sum_s \pi_s \log_2 \pi_s$

Entropy Gain:

$D_k = H_f - H_{l,k}$

Coding Complexity:

$T_{\text{lossless}} = \max\big\{(1-c)zT_{\text{lm}}, czT_{\text{fm}}\big\} \simeq (1-c)zT_{\text{lm}}$

Coding Rate:

$R = \frac{B}{N}$

where $B$ is the total bit count and $N$ is the length of the original CSI feature vector.

Distortion Metric (NMSE in dB):

$\text{NMSE} = 10 \log_{10} \mathbb{E} \left[ \frac{\|H_a - \hat{H}_a\|_2^2}{\|H_a\|_2^2} \right]$

5. Empirical Performance and Comparative Baselines

Experiments were conducted on both the real Argos 2.4 GHz indoor dataset and the COST2100 simulated channel dataset. The full pipeline included conversion to the angle-delay domain, a lossy encoder operating at $1/128$ compression ratio, $n$ -bit quantization (typically $n=7$ ), and final lossless compression.

Comparison metrics: Bits per element ( $R$ ) and NMSE in dB.

Baselines:

Fixed-length coding at 7 bits per symbol ( $R \approx 0.0546$ ).
FM-only entropy coding ( $R = 0.0534$ ).
CSI-trained Transformer decoder LM ( $R=0.0417$ ), approximating the entropy lower bound ( $R=0.0416$ ).

Zero-shot LLM results:

Model	Parameters	$R$ (bits/element)	CSI-tuning?
ChatGPT-5	$>$ 1T	$0.0513$	No
ChatGPT-4o	$\approx$ 200B	$0.0515$	No

These zero-shot LLM compressors outperform fixed-length and FM-only coding, and without CSI-specific fine-tuning, capture $>98\%$ of the achievable entropy-coding gain.

6. Ablation Insights and Impact of Prompt and Model Selection

Systematic ablation studies reveal several key phenomena:

Complexity Control ( $c$ ) Sweep: Decreasing $c$ (increasing LM usage) drives $R$ toward the entropy bound, but increases $T_{\text{lossless}}$ linearly with $(1-c)$ . Figure 1 in (Yang et al., 10 Jan 2026) displays the RDC trade-off for various quantization precisions.
Swin-Transformer Window Size: Optimal NMSE is achieved at window size $4$ in the lossy module, with performance degrading for smaller windows.
Prompt Design: Use of in-context demonstrations vs. plain instruction yields a minor difference ( $\approx0.002$ bits/element) in $R$ for ChatGPT-5 compared to a domain-trained LM.
Model Choice: A domain-specific LM with $12.5$M parameters outperforms general-purpose LLMs by $\approx0.009$ bits/element. However, general LLMs still realize over 98% of the maximal entropy-coding gain in zero-shot deployment.

7. Significance and Practical Considerations

Flattening quantized CSI features into ASCII tokens and orchestrating general LLMs as zero-shot probability oracles—augmented by a factorized model and an explicitly tunable complexity parameter—enables near-entropy optimal, lossless CSI coding without customized model training or large-scale retraining. This makes practical, runtime deployment of high-efficiency lossless CSI feedback achievable on commodity hardware with on-device LLM capabilities (Yang et al., 10 Jan 2026). A plausible implication is broad applicability in bandwidth-constrained massive MIMO systems where domain-specific re-training is infeasible or undesirable.

Markdown Report Issue Upgrade to Chat

References (1)

TCLNet: A Hybrid Transformer-CNN Framework Leveraging Language Models as Lossless Compressors for CSI Feedback (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Zero-Shot CSI Lossless Compressors.