Papers
Topics
Authors
Recent
Search
2000 character limit reached

Inference-Time ARQ: Neural-Coded Feedback

Updated 26 February 2026
  • Inference-Time ARQ is a paradigm that replaces traditional 1-bit acknowledgments with high-dimensional, neural-coded feedback to enhance wireless reliability and coverage.
  • It employs asynchronous feedback, transformer-based encoding and decoding, and SNR-conditioned training to minimize latency and boost system performance.
  • Experimental findings demonstrate up to 43% latency reduction, 8.8–9.5 dB SNR gains, and significant access point density reductions, underscoring its practical benefits.

Inference-time ARQ refers to a paradigm in wireless communications where the feedback mechanism during automatic repeat request (ARQ) exploits high-dimensional, information-rich vector feedback in place of traditional 1-bit acknowledgments. This approach transforms feedback from passive acknowledgment to an active collaboration between user equipment (UE) and access point (AP), realized by neural-coded feedback tightly integrated with physical-layer channel coding. The Rich-ARQ scheme exemplifies inference-time ARQ, introducing neural-network based encoders and decoders, asynchronous feedback code design, and a full-stack radio prototype that decouples inference from strict radio timing to address coverage, reliability, and latency bottlenecks in next-generation wireless systems (Chen et al., 8 Feb 2026).

1. System Model and Channel Description

The Rich-ARQ system is deployed on a star-topology link (single UE to AP) layered with an OFDM physical layer. Let b{0,1}Kb\in\{0,1\}^K denote the information bits. The UE at forward round tt uses a neural encoder ftx()f_\text{tx}(\cdot) to generate the codeword c(t)CMc^{(t)}\in \mathbb{C}^M (where MM is the number of symbols per PRB):

c(t)=ftx(b,{c(0),...,c(t1)},{y^(0),...,y^(t)}),c^{(t)} = f_\text{tx}(b,\{c^{(0)},...,c^{(t-1)}\},\{\hat{y}^{(0)},...,\hat{y}^{(t')}\}),

with tt1t'\leq t-1 the latest feedback received. The uplink channel for subcarrier ii is

yi(t)=αi(t)ci(t)+ni(t),y_i^{(t)} = \alpha_i^{(t)} c_i^{(t)} + n_i^{(t)},

where αi(t)\alpha_i^{(t)} is the frequency-domain channel gain and ni(t)CN(0,σ2)n_i^{(t)}\sim \mathcal{CN}(0,\sigma^2) (AWGN).

At the AP, a neural feedback decoder grx()g_\text{rx}(\cdot) generates a high-dimensional downlink feedback vector c^(t)Rdfb\hat{c}^{(t)}\in\mathbb{R}^{d_\mathrm{fb}}:

c^(t)=grx(y(t),γ(t)),\hat{c}^{(t)} = g_\text{rx}(y^{(t)},\gamma^{(t)}),

where γ(t)\gamma^{(t)} is the instantaneous received SNR estimate. The feedback channel for subcarrier ii is:

y^i(t)=α~i(t)c^i(t)+w~i(t),\hat{y}_i^{(t)} = \tilde{\alpha}_i^{(t)} \hat{c}_i^{(t)} + \tilde{w}_i^{(t)},

with α~i(t)\tilde{\alpha}_i^{(t)} and w~i(t)\tilde{w}_i^{(t)} denoting downlink channel gain and AWGN, respectively (Chen et al., 8 Feb 2026).

2. Neural Network Architectures for Coding and Feedback

The UE encoder is a lightweight attention-based feedback code (AFC):

  • Inputs at round tt: message bits bb, past codewords c(0:t1)c^{(0:t-1)}, received feedback s^(0:t)\hat{s}^{(0:t')}, and SNR embedding ξ(t)=MLPSNR(γ(t))\xi^{(t)}=\mathrm{MLP}_\mathrm{SNR}(\gamma^{(t)}).
  • Input stacking: Q(t)=[b;c(0);...;c(t1);0;s^(0);...;s^(t);0;ξ(t)]RL×dmodelQ^{(t)} = [b; c^{(0)};...;c^{(t-1)};0;\hat{s}^{(0)};...;\hat{s}^{(t')};0;\xi^{(t)}]\in \mathbb{R}^{L\times d_\mathrm{model}}.
  • Processing: Q(t)Q^{(t)} is input to NeN_e transformer-like layers (comprised of LayerNorm, single-head self-attention, Add&Norm, position-wise feed-forward, Add&Norm).
  • Output: Linear mapping produces c(t)c^{(t)}.

Typical parameterization: dmodel=32d_\mathrm{model}=32, dff=64d_\mathrm{ff}=64, Ne=2N_e=2, M180M\approx180, total parameters 21\sim21K.

The AP decoder is full-capacity, comprising Nf=4N_f=4 transformer layers with multi-head attention (H=4H=4, dk=16d_k=16, feed-forward dimensions 256), outputting c^(t)R128\hat{c}^{(t)}\in\mathbb{R}^{128} for feedback, and a subsequent stack for final bit-probability decoding. Total decoder parameters are 40\sim40K. Both UE and AP employ SNR-conditioned embeddings and transformer architectures, with asymmetric complexity favoring lightweight on-device encoding (Chen et al., 8 Feb 2026).

3. Asynchronous Feedback Code Construction

Classical ARQ implementations are synchronous, requiring the UE's encoder to idle at round tt until feedback for t1t-1 has arrived, introducing stalls whenever there is feedback delay. Rich-ARQ’s AFC eliminates this bottleneck by allowing the encoder at round tt to utilize any feedback received up to t1t-1; formally, if

t=max{r:feedback y^(r) received}t1,t' = \max\{ r : \text{feedback } \hat{y}^{(r)} \text{ received}\} \leq t-1,

then

c(t)=ftx(b,c(0:t1),y^(0:t),γ(0:t)).c^{(t)} = f_\text{tx}(b, c^{(0:t-1)}, \hat{y}^{(0:t')}, \gamma^{(0:t')}).

This architecture overlaps the forward and feedback pipelines, preventing encoder stalls. If the forward slot duration is d=Tenc+Ttxd=T_\text{enc}+T_\text{tx} and feedback slot is f=Tfbf=T_\text{fb}, synchronous total latency over TT rounds is

Dsync=Td+(T1)f,D_\text{sync} = T\cdot d + (T-1)\cdot f,

while the asynchronous AFC achieves

Dasync=Td+(T1)f+dD_\text{async} = T d' + (T-1) f + d

with effective forward interval d=(df)/2d' = (d - f)/2 (if d>fd > f). In prototype settings (d10d\approx 10ms, f4f\approx 4ms, T=9T=9), this design yields Dsync122D_\text{sync}\approx122ms vs Dasync69D_\text{async}\approx69ms—a reduction of 43%43\% in total latency (Chen et al., 8 Feb 2026).

4. Joint Training Strategy and Loss Objectives

Rich-ARQ performs joint end-to-end training of encoder (θtx\theta_\text{tx}) and decoder (θrx\theta_\text{rx}) across randomized messages and channel realizations:

L(θtx,θrx)=Eb,α,α~,σ2 ⁣[block(b,b^(T))]L(\theta_\text{tx}, \theta_\text{rx}) = \mathbb{E}_{b, \alpha, \tilde{\alpha}, \sigma^2}\!\left[ \ell_\text{block}(b, \hat{b}^{(T)}) \right]

where the block error loss uses standard bit-wise cross-entropy:

block(b,b^)=k=1K[bklogb^k+(1bk)log(1b^k)].\ell_\text{block}(b,\hat{b}) = -\sum_{k=1}^{K} [b_k\log\hat{b}_k + (1-b_k)\log(1-\hat{b}_k)].

Robustness across SNR regimes is encouraged by SNR-conditioned curriculum training with Langevin perturbations. Each mini-batch SNR is sampled from a convex mixture γmixα(k)Porig+(1α(k))Ptarg\gamma_\text{mix} \sim \alpha(k)P_\text{orig} + (1-\alpha(k))P_\text{targ} and ϵN(0,σpert2)\epsilon \sim \mathcal{N}(0,\sigma_\text{pert}^2). α(k)\alpha(k) is annealed over the curriculum to enhance generalization, and perturbations emulate real SNR jitter (Chen et al., 8 Feb 2026).

5. Theoretical Performance and Coverage Gains

Let SNRHARQ(PE)\mathrm{SNR}_\mathrm{HARQ}(\mathrm{PE}) denote the SNR required by 1-bit HARQ to achieve packet error PE\mathrm{PE}, and SNRRich\mathrm{SNR}_\mathrm{Rich} for Rich-ARQ. The observed gap Δγ=SNRHARQSNRRich\Delta\gamma = \mathrm{SNR}_\mathrm{HARQ} - \mathrm{SNR}_\mathrm{Rich} is $8.8$ dB (Turbo-HARQ) and $9.5$ dB (Polar-HARQ) at PE=104\mathrm{PE}=10^{-4}.

Coverage improvements are derived via log-distance path-loss: PL(d)=PL0+10nlog10(d/d0)PL(d)=PL_0+10n\log_{10}(d/d_0), with maximum range scaling

dmax10(Ptx+Gtx+GrxSrx)/(10n)d_\mathrm{max} \propto 10^{(P_\mathrm{tx} + G_\mathrm{tx} + G_\mathrm{rx} - S_\mathrm{rx})/(10n)}

and the Rich-ARQ coverage ratio is dmax,Rich/dmax,HARQ=10Δγ/(10n)d_{\mathrm{max},\mathrm{Rich}}/d_{\mathrm{max},\mathrm{HARQ}} = 10^{\Delta\gamma / (10n)}.

For n=3n=3 and Δγ8.8\Delta\gamma\approx8.8 dB (Turbo), one obtains a 1.38×1.38\times coverage extension; for Polar, 1.70×1.70\times (70%70\% farther). This suggests that AP density can be reduced by 47%47\% (Turbo) or 65%65\% (Polar) for an equivalent error target (Chen et al., 8 Feb 2026).

6. Experimental Setup, Findings, and Comparative Metrics

A full-stack, standard-compliant prototype based on USRP X310 radios and an OFDM PHY (15 kHz spacing, 1 PRB = 12 subcarriers ×\times 1 ms) implements the Rich-ARQ framework. The experimental protocol comprises downlink synchronization, periodic uplink grants, and Rich-ARQ sessions over multiple rounds.

Key results include:

  • At PER=104\mathrm{PER}=10^{-4}, Rich-ARQ requires $8.8$ dB ($9.5$ dB) less SNR than Turbo-HARQ (Polar-HARQ).
  • Coverage ratio increases by 1.38×1.38\times/1.70×1.70\times, implying 47%47\%/65%65\% fewer APs required at target PER.
  • Rich-ARQ exhibits monotonic robustness across SNRs, in contrast to previous deep learning-based feedback methods (e.g., GBAF fails away from training SNR).
  • End-to-end latency reduction: from Dsync122D_\text{sync} \approx 122 ms (synchronous) to Dasync69D_\text{async} \approx 69 ms (AFC), a 43%43\% improvement. Across various slot settings, AFC achieves 4670%46-70\% lower "effective" forward intervals.
  • Encoder model/compute: 21K parameters and 444K FLOPs (UE) vs. 40K/661K (AP); encoder runtime on GPU is $67.5$ ms (Rich-ARQ) versus $111$ ms (vanilla), for a 39%39\% speedup. Estimated FPGA inference times are $0.08$ ms (Kintex-7) or $0.56$ ms (Spartan-7) (Chen et al., 8 Feb 2026).

7. Practical Implementation and Deployment Considerations

Rich-ARQ uses a non-blocking, deadline-aware architecture:

  • The PHY thread never blocks on encoder network inference. Upon an uplink PRB, it asynchronously requests encoder output; an empty slot is transmitted if not ready by the deadline.
  • At the AP, feedback and final decoding are decoupled—feedback generation executes in a low-latency thread, while final decoding aggregates all forward receptions.
  • Encoder complexity is minimized for on-device use via pruning, sparse feed-forward, and asymmetric model size.
  • FPGA-friendly network designs (compact transformer/MLP layers) enable (sub-)ms inference latencies.
  • Robustness to real-world channel/SNR fluctuations is secured by SNR-conditioned curricula with stochastic perturbation during training.

Rich-ARQ demonstrates compatibility with existing 4G/5G OFDM PHYs, showing that high-dimensional, neural-coded “rich” inference-time ARQ can deliver marked improvements in reliability, coverage, and latency, even under feedback delay, SNR variation, and practical hardware constraints (Chen et al., 8 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Inference-Time ARQ.