Papers
Topics
Authors
Recent
2000 character limit reached

Elastic-LASCO: Adaptive Neural CSI Feedback

Updated 20 December 2025
  • Elastic-LASCO is an adaptive neural framework for environment-specific CSI feedback that integrates a pre-trained large model and a lightweight SAM.
  • The method uses a learnable coefficient to dynamically balance outputs, achieving near-optimal adaptation in latency-constrained and low-data scenarios.
  • By combining shared LAM priors with fast SAM tuning, Elastic-LASCO reduces training costs and inference time for massive MIMO systems.

Elastic-LASCO (E-LASCO) is an adaptive neural framework for environment-specific channel state information (CSI) feedback in massive MIMO wireless systems. It extends the Large and Small Model Collaboration (LASCO) methodology by introducing a learnable collaboration coefficient, enabling dynamic balancing between the outputs of a large, pre-trained model (LAM) and a lightweight, environment-specific model (SAM). E-LASCO achieves near-optimal environment adaptation with minimal computation, training cost, and data requirements, while preserving the LAM’s generalized knowledge base. The method is particularly suited for environments with limited adaptation data or operational constraints on latency and memory (Cui et al., 13 Dec 2025).

1. Problem Formulation and Motivation

In frequency-division duplex (FDD) massive MIMO systems, user equipment (UE) must transmit feedback of the downlink channel matrix HCNt×NcH \in \mathbb{C}^{N_t \times N_c} to the base station (BS). The channel matrix is vectorized, concatenating real and imaginary components for a real-valued representation of dimension 2NtNc2N_tN_c, and compressed via linear projection s=Avec(H)RMs = A\,\mathrm{vec}(H) \in \mathbb{R}^M with compression ratio γ=2NtNc/M1\gamma = 2N_tN_c/M \ll 1. The BS performs a coarse inversion Hin=devec(As)H_\mathrm{in} = \mathrm{devec}(A^\dagger s) and refines the result using neural reconstruction H^=fNN(Hin)\widehat{H} = f_\mathrm{NN}(H_\mathrm{in}). Losses such as normalized MSE lMSE(H,H^)=H^H22H22l_\mathrm{MSE}(H,\widehat{H}) = \frac{\|\widehat{H}-H\|_2^2}{\|H\|_2^2} and negative generalized cosine similarity (GCS) are standard.

While pre-trained LAMs capture broad channel priors, they perform suboptimally in specific environments. Direct fine-tuning of LAMs is impractical due to high computational cost, latency, inference inefficiency in large-scale deployments, catastrophic forgetting, and inaccessible parameters. Conversely, standalone SAMs can rapidly adapt but suffer from poor generalization and possible physical inconsistency in channel reconstruction. The collaborative paradigm in LASCO leverages the LAM for universal priors and the SAM for rapid, environment-local adaptation.

2. LASCO Architecture and Mechanism

LASCO divides the learning process into contributions from a frozen, pre-trained LAM (fbasef_\mathrm{base}) and two SAMs: a reference network (freff_\mathrm{ref}) and a proxy network (fpxyf_\mathrm{pxy}). All three networks accept the initial neural input HinH_\mathrm{in}.

  • fbasef_\mathrm{base}: Pre-trained on pooled data from 100+ regions (20-block Transformer); frozen during per-environment adaptation; provides general reconstruction Qbase=fbase(Hin)Q_\mathrm{base}=f_\mathrm{base}(H_\mathrm{in}).
  • freff_\mathrm{ref}: Same architecture (2-block Transformer) as fpxyf_\mathrm{pxy}; co-trained to mimic fbasef_\mathrm{base} on general data, such that fref(Hin)fbase(Hin)f_\mathrm{ref}(H_\mathrm{in}) \approx f_\mathrm{base}(H_\mathrm{in}).
  • fpxyf_\mathrm{pxy}: Initialized from freff_\mathrm{ref} and fine-tuned on environment-specific samples.

The environment shift is quantified as Δ(Hin)=fpxy(Hin)fref(Hin)\Delta(H_\mathrm{in}) = f_\mathrm{pxy}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in}) and applied additively:

H^=fbase(Hin)+Δ(Hin)=fbase(Hin)+fpxy(Hin)fref(Hin)\widehat{H} = f_\mathrm{base}(H_\mathrm{in}) + \Delta(H_\mathrm{in}) = f_\mathrm{base}(H_\mathrm{in}) + f_\mathrm{pxy}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in})

LASCO’s proxy-aware loss for training fpxyf_\mathrm{pxy} is given by

l1=H[fbase(Hin)+fpxy(Hin)fref(Hin)]22l_1 = \|H - [f_\mathrm{base}(H_\mathrm{in}) + f_\mathrm{pxy}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in})]\|_2^2

3. Elastic-LASCO: Learnable Collaboration and Adaptation

E-LASCO introduces a learnable scalar or low-dimensional coefficient α\alpha that governs the relative contribution of LAM and SAM outputs per environment. The reconstruction thus becomes

H^(α)=fpxy(Hin)+α[fbase(Hin)fref(Hin)]\widehat{H}(\alpha) = f_\mathrm{pxy}(H_\mathrm{in}) + \alpha [f_\mathrm{base}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in})]

The adaptation objective is to jointly optimize fpxyf_\mathrm{pxy} and α\alpha on region-specific data DS\mathcal{D}_S:

minfpxy,α    E(Hin,H)DSH(fpxy(Hin)+α[fbase(Hin)fref(Hin)])22\min_{f_\mathrm{pxy},\,\alpha}\;\; \mathbb{E}_{(H_\mathrm{in},\,H)\in\mathcal{D}_S} \bigg\|\, H - (f_\mathrm{pxy}(H_\mathrm{in}) + \alpha [f_\mathrm{base}(H_\mathrm{in}) - f_\mathrm{ref}(H_\mathrm{in})])\,\bigg\|_2^2

This formulation removes the need for manual hyperparameter search for α\alpha and automatically tunes the LAM–SAM contribution per adaptation scenario.

4. Model Training, Adaptation, and Inference Workflow

Pre-training is performed once, using MSE (and optionally GCS) loss, on a comprehensive CSI data pool. fbasef_\mathrm{base} (20-block Transformer) and freff_\mathrm{ref} (2-block Transformer) are jointly optimized, with freff_\mathrm{ref} trained to align with fbasef_\mathrm{base}. For each new deployment region:

  • A region-specific CSI dataset DS\mathcal{D}_S (typically 1k–8k samples) is gathered.
  • fpxyf_\mathrm{pxy} is initialized from freff_\mathrm{ref}; α\alpha typically starts at 1.
  • fbasef_\mathrm{base} and freff_\mathrm{ref} are frozen.
  • Adaptation proceeds by optimizing the joint objective above using stochastic gradient descent (AdamW, lr=1e-3\text{lr}=1\text{e-3}), with early stopping based on validation NMSE.
  • Only fpxyf_\mathrm{pxy} (about 0.5% the size of LAM) and α\alpha are updated. This results in fast adaptation (typically tens of epochs), memory efficiency, and rapid inference by batching fbasef_\mathrm{base} across users while running fpxyf_\mathrm{pxy} per environment.

5. Quantitative Performance and Resource Consumption

Numerical evaluations over 10 held-out test regions (compression ratio γ0.25\gamma\approx0.25, adaptation set size 8k unless noted) establish the following:

Method NMSE (dB) GCS Adaptation Epochs Additional FLOPs
Pre-trained LAM –8 0.92 N/A Baseline
Fine-tuned SAM –10 0.94 60 +2 blocks/region
LASCO (fixed α=0.7) –12 0.96 45 +4%
E-LASCO (learned α) –13 0.97 30 +4%

With only 1,000 adaptation samples, fine-tuned SAM stalls at NMSE –9 dB, LASCO reaches –11 dB, and E-LASCO achieves –12 dB. Baseline joint fine-tuning of LAM+SAM does not converge with less than 2,000 samples. E-LASCO typically halves adaptation time compared to isolated SAM fine-tuning. The marginal cost is approximately +4% FLOPs for more than 2 dB gain, leveraging shared LAM computation across users.

6. Methodological Insights and Applicability

E-LASCO’s architecture exploits the division of labor between a universal LAM and fast-adapting SAMs, enabling black-box, computation-light, and data-efficient environment adaptation. The dual-SAM mechanism (reference and proxy) ensures task consistency and faithful emulation of LAM fine-tuning behavior. The learnable α\alpha coefficient provides dynamic adaptation, enhancing robustness to varying degrees of distribution shift: higher α\alpha increases reliance on the SAM in out-of-distribution environments, while lower α\alpha defers to the LAM in familiar settings.

E-LASCO is most advantageous under constraints on adaptation data size (<5<5k samples) or where low-latency, low-memory deployment is required. Although E-LASCO is motivated by the CSI feedback use case, the underlying elastic collaboration principle is directly applicable to tasks such as channel prediction and beam management in the air interface domain.

7. Practical Considerations and Limitations

E-LASCO’s main constraints stem from situations where extreme domain shifts fundamentally limit universal LAM transfer, in which case α\alpha approaches the regime of pure SAM reliance. The method assumes pre-trained LAM and reference SAM availability and depends on the initial alignment between fbasef_\mathrm{base} and freff_\mathrm{ref}. While joint LAM+SAM adaptation is possible in principle, data requirements for stable convergence are substantially higher compared to the proposed two-stage paradigm. Efficient inference is ensured by sharing the LAM across users and maintaining minimal per-environment state in the SAM proxy and scalar coefficient.

Elastic-LASCO constitutes a significant contribution for environment-adaptive neural CSI feedback, achieving the performance of full LAM fine-tuning at a fraction of the trainable parameter count, data requirement, and latency, while upholding the generalization capacity of the base model (Cui et al., 13 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Elastic-LASCO (E-LASCO).