Elastic-LASCO: Adaptive Neural CSI Feedback

Updated 20 December 2025

Elastic-LASCO is an adaptive neural framework for environment-specific CSI feedback that integrates a pre-trained large model and a lightweight SAM.
The method uses a learnable coefficient to dynamically balance outputs, achieving near-optimal adaptation in latency-constrained and low-data scenarios.
By combining shared LAM priors with fast SAM tuning, Elastic-LASCO reduces training costs and inference time for massive MIMO systems.

Elastic-LASCO (E-LASCO) is an adaptive neural framework for environment-specific channel state information (CSI) feedback in massive MIMO wireless systems. It extends the Large and Small Model Collaboration (LASCO) methodology by introducing a learnable collaboration coefficient, enabling dynamic balancing between the outputs of a large, pre-trained model (LAM) and a lightweight, environment-specific model (SAM). E-LASCO achieves near-optimal environment adaptation with minimal computation, training cost, and data requirements, while preserving the LAM’s generalized knowledge base. The method is particularly suited for environments with limited adaptation data or operational constraints on latency and memory (Cui et al., 13 Dec 2025).

1. Problem Formulation and Motivation

In frequency-division duplex (FDD) massive MIMO systems, user equipment (UE) must transmit feedback of the downlink channel matrix $H \in \mathbb{C}^{N_t \times N_c}$ to the base station (BS). The channel matrix is vectorized, concatenating real and imaginary components for a real-valued representation of dimension $2N_tN_c$ , and compressed via linear projection $s = A\,\mathrm{vec}(H) \in \mathbb{R}^M$ with compression ratio $\gamma = 2N_tN_c/M \ll 1$ . The BS performs a coarse inversion $H_\mathrm{in} = \mathrm{devec}(A^\dagger s)$ and refines the result using neural reconstruction $\widehat{H} = f_\mathrm{NN}(H_\mathrm{in})$ . Losses such as normalized MSE $l_\mathrm{MSE}(H,\widehat{H}) = \frac{\|\widehat{H}-H\|_2^2}{\|H\|_2^2}$ and negative generalized cosine similarity (GCS) are standard.

While pre-trained LAMs capture broad channel priors, they perform suboptimally in specific environments. Direct fine-tuning of LAMs is impractical due to high computational cost, latency, inference inefficiency in large-scale deployments, catastrophic forgetting, and inaccessible parameters. Conversely, standalone SAMs can rapidly adapt but suffer from poor generalization and possible physical inconsistency in channel reconstruction. The collaborative paradigm in LASCO leverages the LAM for universal priors and the SAM for rapid, environment-local adaptation.

2. LASCO Architecture and Mechanism

LASCO divides the learning process into contributions from a frozen, pre-trained LAM ( $f_\mathrm{base}$ ) and two SAMs: a reference network ( $f_\mathrm{ref}$ ) and a proxy network ( $f_\mathrm{pxy}$ ). All three networks accept the initial neural input $H_\mathrm{in}$ .

$f_\mathrm{base}$ : Pre-trained on pooled data from 100+ regions (20-block Transformer); frozen during per-environment adaptation; provides general reconstruction $Q_\mathrm{base}=f_\mathrm{base}(H_\mathrm{in})$ .
$f_\mathrm{ref}$ : Same architecture (2-block Transformer) as $f_\mathrm{pxy}$ ; co-trained to mimic $f_\mathrm{base}$ on general data, such that $f_\mathrm{ref}(H_\mathrm{in}) \approx f_\mathrm{base}(H_\mathrm{in})$ .
$f_\mathrm{pxy}$ : Initialized from $f_\mathrm{ref}$ and fine-tuned on environment-specific samples.

The environment shift is quantified as $\Delta(H_\mathrm{in}) = f_\mathrm{pxy}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in})$ and applied additively:

$\widehat{H} = f_\mathrm{base}(H_\mathrm{in}) + \Delta(H_\mathrm{in}) = f_\mathrm{base}(H_\mathrm{in}) + f_\mathrm{pxy}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in})$

LASCO’s proxy-aware loss for training $f_\mathrm{pxy}$ is given by

$l_1 = \|H - [f_\mathrm{base}(H_\mathrm{in}) + f_\mathrm{pxy}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in})]\|_2^2$

3. Elastic-LASCO: Learnable Collaboration and Adaptation

E-LASCO introduces a learnable scalar or low-dimensional coefficient $\alpha$ that governs the relative contribution of LAM and SAM outputs per environment. The reconstruction thus becomes

$\widehat{H}(\alpha) = f_\mathrm{pxy}(H_\mathrm{in}) + \alpha [f_\mathrm{base}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in})]$

The adaptation objective is to jointly optimize $f_\mathrm{pxy}$ and $\alpha$ on region-specific data $\mathcal{D}_S$ :

$\min_{f_\mathrm{pxy},\,\alpha}\;\; \mathbb{E}_{(H_\mathrm{in},\,H)\in\mathcal{D}_S} \bigg\|\, H - (f_\mathrm{pxy}(H_\mathrm{in}) + \alpha [f_\mathrm{base}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in})])\,\bigg\|_2^2$

This formulation removes the need for manual hyperparameter search for $\alpha$ and automatically tunes the LAM–SAM contribution per adaptation scenario.

4. Model Training, Adaptation, and Inference Workflow

Pre-training is performed once, using MSE (and optionally GCS) loss, on a comprehensive CSI data pool. $f_\mathrm{base}$ (20-block Transformer) and $f_\mathrm{ref}$ (2-block Transformer) are jointly optimized, with $f_\mathrm{ref}$ trained to align with $f_\mathrm{base}$ . For each new deployment region:

A region-specific CSI dataset $\mathcal{D}_S$ (typically 1k–8k samples) is gathered.
$f_\mathrm{pxy}$ is initialized from $f_\mathrm{ref}$ ; $\alpha$ typically starts at 1.
$f_\mathrm{base}$ and $f_\mathrm{ref}$ are frozen.
Adaptation proceeds by optimizing the joint objective above using stochastic gradient descent (AdamW, $\text{lr}=1\text{e-3}$ ), with early stopping based on validation NMSE.
Only $f_\mathrm{pxy}$ (about 0.5% the size of LAM) and $\alpha$ are updated. This results in fast adaptation (typically tens of epochs), memory efficiency, and rapid inference by batching $f_\mathrm{base}$ across users while running $f_\mathrm{pxy}$ per environment.

5. Quantitative Performance and Resource Consumption

Numerical evaluations over 10 held-out test regions (compression ratio $\gamma\approx0.25$ , adaptation set size 8k unless noted) establish the following:

Method	NMSE (dB)	GCS	Adaptation Epochs	Additional FLOPs
Pre-trained LAM	–8	0.92	N/A	Baseline
Fine-tuned SAM	–10	0.94	60	+2 blocks/region
LASCO (fixed α=0.7)	–12	0.96	45	+4%
E-LASCO (learned α)	–13	0.97	30	+4%

With only 1,000 adaptation samples, fine-tuned SAM stalls at NMSE –9 dB, LASCO reaches –11 dB, and E-LASCO achieves –12 dB. Baseline joint fine-tuning of LAM+SAM does not converge with less than 2,000 samples. E-LASCO typically halves adaptation time compared to isolated SAM fine-tuning. The marginal cost is approximately +4% FLOPs for more than 2 dB gain, leveraging shared LAM computation across users.

6. Methodological Insights and Applicability

E-LASCO’s architecture exploits the division of labor between a universal LAM and fast-adapting SAMs, enabling black-box, computation-light, and data-efficient environment adaptation. The dual-SAM mechanism (reference and proxy) ensures task consistency and faithful emulation of LAM fine-tuning behavior. The learnable $\alpha$ coefficient provides dynamic adaptation, enhancing robustness to varying degrees of distribution shift: higher $\alpha$ increases reliance on the SAM in out-of-distribution environments, while lower $\alpha$ defers to the LAM in familiar settings.

E-LASCO is most advantageous under constraints on adaptation data size ( $<5$ k samples) or where low-latency, low-memory deployment is required. Although E-LASCO is motivated by the CSI feedback use case, the underlying elastic collaboration principle is directly applicable to tasks such as channel prediction and beam management in the air interface domain.

7. Practical Considerations and Limitations

E-LASCO’s main constraints stem from situations where extreme domain shifts fundamentally limit universal LAM transfer, in which case $\alpha$ approaches the regime of pure SAM reliance. The method assumes pre-trained LAM and reference SAM availability and depends on the initial alignment between $f_\mathrm{base}$ and $f_\mathrm{ref}$ . While joint LAM+SAM adaptation is possible in principle, data requirements for stable convergence are substantially higher compared to the proposed two-stage paradigm. Efficient inference is ensured by sharing the LAM across users and maintaining minimal per-environment state in the SAM proxy and scalar coefficient.

Elastic-LASCO constitutes a significant contribution for environment-adaptive neural CSI feedback, achieving the performance of full LAM fine-tuning at a fraction of the trainable parameter count, data requirement, and latency, while upholding the generalization capacity of the base model (Cui et al., 13 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Large and Small Model Collaboration for Air Interface (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Elastic-LASCO (E-LASCO).