Inference-Time ARQ: Neural-Coded Feedback

Updated 26 February 2026

Inference-Time ARQ is a paradigm that replaces traditional 1-bit acknowledgments with high-dimensional, neural-coded feedback to enhance wireless reliability and coverage.
It employs asynchronous feedback, transformer-based encoding and decoding, and SNR-conditioned training to minimize latency and boost system performance.
Experimental findings demonstrate up to 43% latency reduction, 8.8–9.5 dB SNR gains, and significant access point density reductions, underscoring its practical benefits.

Inference-time ARQ refers to a paradigm in wireless communications where the feedback mechanism during automatic repeat request (ARQ) exploits high-dimensional, information-rich vector feedback in place of traditional 1-bit acknowledgments. This approach transforms feedback from passive acknowledgment to an active collaboration between user equipment (UE) and access point (AP), realized by neural-coded feedback tightly integrated with physical-layer channel coding. The Rich-ARQ scheme exemplifies inference-time ARQ, introducing neural-network based encoders and decoders, asynchronous feedback code design, and a full-stack radio prototype that decouples inference from strict radio timing to address coverage, reliability, and latency bottlenecks in next-generation wireless systems (Chen et al., 8 Feb 2026).

1. System Model and Channel Description

The Rich-ARQ system is deployed on a star-topology link (single UE to AP) layered with an OFDM physical layer. Let $b\in\{0,1\}^K$ denote the information bits. The UE at forward round $t$ uses a neural encoder $f_\text{tx}(\cdot)$ to generate the codeword $c^{(t)}\in \mathbb{C}^M$ (where $M$ is the number of symbols per PRB):

$c^{(t)} = f_\text{tx}(b,\{c^{(0)},...,c^{(t-1)}\},\{\hat{y}^{(0)},...,\hat{y}^{(t')}\}),$

with $t'\leq t-1$ the latest feedback received. The uplink channel for subcarrier $i$ is

$y_i^{(t)} = \alpha_i^{(t)} c_i^{(t)} + n_i^{(t)},$

where $\alpha_i^{(t)}$ is the frequency-domain channel gain and $n_i^{(t)}\sim \mathcal{CN}(0,\sigma^2)$ (AWGN).

At the AP, a neural feedback decoder $g_\text{rx}(\cdot)$ generates a high-dimensional downlink feedback vector $\hat{c}^{(t)}\in\mathbb{R}^{d_\mathrm{fb}}$ :

$\hat{c}^{(t)} = g_\text{rx}(y^{(t)},\gamma^{(t)}),$

where $\gamma^{(t)}$ is the instantaneous received SNR estimate. The feedback channel for subcarrier $i$ is:

$\hat{y}_i^{(t)} = \tilde{\alpha}_i^{(t)} \hat{c}_i^{(t)} + \tilde{w}_i^{(t)},$

with $\tilde{\alpha}_i^{(t)}$ and $\tilde{w}_i^{(t)}$ denoting downlink channel gain and AWGN, respectively (Chen et al., 8 Feb 2026).

2. Neural Network Architectures for Coding and Feedback

The UE encoder is a lightweight attention-based feedback code (AFC):

Inputs at round $t$ : message bits $b$ , past codewords $c^{(0:t-1)}$ , received feedback $\hat{s}^{(0:t')}$ , and SNR embedding $\xi^{(t)}=\mathrm{MLP}_\mathrm{SNR}(\gamma^{(t)})$ .
Input stacking: $Q^{(t)} = [b; c^{(0)};...;c^{(t-1)};0;\hat{s}^{(0)};...;\hat{s}^{(t')};0;\xi^{(t)}]\in \mathbb{R}^{L\times d_\mathrm{model}}$ .
Processing: $Q^{(t)}$ is input to $N_e$ transformer-like layers (comprised of LayerNorm, single-head self-attention, Add&Norm, position-wise feed-forward, Add&Norm).
Output: Linear mapping produces $c^{(t)}$ .

Typical parameterization: $d_\mathrm{model}=32$ , $d_\mathrm{ff}=64$ , $N_e=2$ , $M\approx180$ , total parameters $\sim21$ K.

The AP decoder is full-capacity, comprising $N_f=4$ transformer layers with multi-head attention ( $H=4$ , $d_k=16$ , feed-forward dimensions 256), outputting $\hat{c}^{(t)}\in\mathbb{R}^{128}$ for feedback, and a subsequent stack for final bit-probability decoding. Total decoder parameters are $\sim40$ K. Both UE and AP employ SNR-conditioned embeddings and transformer architectures, with asymmetric complexity favoring lightweight on-device encoding (Chen et al., 8 Feb 2026).

3. Asynchronous Feedback Code Construction

Classical ARQ implementations are synchronous, requiring the UE's encoder to idle at round $t$ until feedback for $t-1$ has arrived, introducing stalls whenever there is feedback delay. Rich-ARQ’s AFC eliminates this bottleneck by allowing the encoder at round $t$ to utilize any feedback received up to $t-1$ ; formally, if

$t' = \max\{ r : \text{feedback } \hat{y}^{(r)} \text{ received}\} \leq t-1,$

then

$c^{(t)} = f_\text{tx}(b, c^{(0:t-1)}, \hat{y}^{(0:t')}, \gamma^{(0:t')}).$

This architecture overlaps the forward and feedback pipelines, preventing encoder stalls. If the forward slot duration is $d=T_\text{enc}+T_\text{tx}$ and feedback slot is $f=T_\text{fb}$ , synchronous total latency over $T$ rounds is

$D_\text{sync} = T\cdot d + (T-1)\cdot f,$

while the asynchronous AFC achieves

$D_\text{async} = T d' + (T-1) f + d$

with effective forward interval $d' = (d - f)/2$ (if $d > f$ ). In prototype settings ( $d\approx 10$ ms, $f\approx 4$ ms, $T=9$ ), this design yields $D_\text{sync}\approx122$ ms vs $D_\text{async}\approx69$ ms—a reduction of $43\%$ in total latency (Chen et al., 8 Feb 2026).

4. Joint Training Strategy and Loss Objectives

Rich-ARQ performs joint end-to-end training of encoder ( $\theta_\text{tx}$ ) and decoder ( $\theta_\text{rx}$ ) across randomized messages and channel realizations:

$L(\theta_\text{tx}, \theta_\text{rx}) = \mathbb{E}_{b, \alpha, \tilde{\alpha}, \sigma^2}\!\left[ \ell_\text{block}(b, \hat{b}^{(T)}) \right]$

where the block error loss uses standard bit-wise cross-entropy:

$\ell_\text{block}(b,\hat{b}) = -\sum_{k=1}^{K} [b_k\log\hat{b}_k + (1-b_k)\log(1-\hat{b}_k)].$

Robustness across SNR regimes is encouraged by SNR-conditioned curriculum training with Langevin perturbations. Each mini-batch SNR is sampled from a convex mixture $\gamma_\text{mix} \sim \alpha(k)P_\text{orig} + (1-\alpha(k))P_\text{targ}$ and $\epsilon \sim \mathcal{N}(0,\sigma_\text{pert}^2)$ . $\alpha(k)$ is annealed over the curriculum to enhance generalization, and perturbations emulate real SNR jitter (Chen et al., 8 Feb 2026).

5. Theoretical Performance and Coverage Gains

Let $\mathrm{SNR}_\mathrm{HARQ}(\mathrm{PE})$ denote the SNR required by 1-bit HARQ to achieve packet error $\mathrm{PE}$ , and $\mathrm{SNR}_\mathrm{Rich}$ for Rich-ARQ. The observed gap $\Delta\gamma = \mathrm{SNR}_\mathrm{HARQ} - \mathrm{SNR}_\mathrm{Rich}$ is $8.8$ dB (Turbo-HARQ) and $9.5$ dB (Polar-HARQ) at $\mathrm{PE}=10^{-4}$ .

Coverage improvements are derived via log-distance path-loss: $PL(d)=PL_0+10n\log_{10}(d/d_0)$ , with maximum range scaling

$d_\mathrm{max} \propto 10^{(P_\mathrm{tx} + G_\mathrm{tx} + G_\mathrm{rx} - S_\mathrm{rx})/(10n)}$

and the Rich-ARQ coverage ratio is $d_{\mathrm{max},\mathrm{Rich}}/d_{\mathrm{max},\mathrm{HARQ}} = 10^{\Delta\gamma / (10n)}$ .

For $n=3$ and $\Delta\gamma\approx8.8$ dB (Turbo), one obtains a $1.38\times$ coverage extension; for Polar, $1.70\times$ ( $70\%$ farther). This suggests that AP density can be reduced by $47\%$ (Turbo) or $65\%$ (Polar) for an equivalent error target (Chen et al., 8 Feb 2026).

6. Experimental Setup, Findings, and Comparative Metrics

A full-stack, standard-compliant prototype based on USRP X310 radios and an OFDM PHY (15 kHz spacing, 1 PRB = 12 subcarriers $\times$ 1 ms) implements the Rich-ARQ framework. The experimental protocol comprises downlink synchronization, periodic uplink grants, and Rich-ARQ sessions over multiple rounds.

Key results include:

At $\mathrm{PER}=10^{-4}$ , Rich-ARQ requires $8.8$ dB ($9.5$ dB) less SNR than Turbo-HARQ (Polar-HARQ).
Coverage ratio increases by $1.38\times$ / $1.70\times$ , implying $47\%$ / $65\%$ fewer APs required at target PER.
Rich-ARQ exhibits monotonic robustness across SNRs, in contrast to previous deep learning-based feedback methods (e.g., GBAF fails away from training SNR).
End-to-end latency reduction: from $D_\text{sync} \approx 122$ ms (synchronous) to $D_\text{async} \approx 69$ ms (AFC), a $43\%$ improvement. Across various slot settings, AFC achieves $46-70\%$ lower "effective" forward intervals.
Encoder model/compute: 21K parameters and 444K FLOPs (UE) vs. 40K/661K (AP); encoder runtime on GPU is $67.5$ ms (Rich-ARQ) versus $111$ ms (vanilla), for a $39\%$ speedup. Estimated FPGA inference times are $0.08$ ms (Kintex-7) or $0.56$ ms (Spartan-7) (Chen et al., 8 Feb 2026).

7. Practical Implementation and Deployment Considerations

Rich-ARQ uses a non-blocking, deadline-aware architecture:

The PHY thread never blocks on encoder network inference. Upon an uplink PRB, it asynchronously requests encoder output; an empty slot is transmitted if not ready by the deadline.
At the AP, feedback and final decoding are decoupled—feedback generation executes in a low-latency thread, while final decoding aggregates all forward receptions.
Encoder complexity is minimized for on-device use via pruning, sparse feed-forward, and asymmetric model size.
FPGA-friendly network designs (compact transformer/MLP layers) enable (sub-)ms inference latencies.
Robustness to real-world channel/SNR fluctuations is secured by SNR-conditioned curricula with stochastic perturbation during training.

Rich-ARQ demonstrates compatibility with existing 4G/5G OFDM PHYs, showing that high-dimensional, neural-coded “rich” inference-time ARQ can deliver marked improvements in reliability, coverage, and latency, even under feedback delay, SNR variation, and practical hardware constraints (Chen et al., 8 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Rich-ARQ: From 1-bit Acknowledgment to Rich Neural Coded Feedback (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Inference-Time ARQ.