Inference-Time ARQ: Neural-Coded Feedback
- Inference-Time ARQ is a paradigm that replaces traditional 1-bit acknowledgments with high-dimensional, neural-coded feedback to enhance wireless reliability and coverage.
- It employs asynchronous feedback, transformer-based encoding and decoding, and SNR-conditioned training to minimize latency and boost system performance.
- Experimental findings demonstrate up to 43% latency reduction, 8.8–9.5 dB SNR gains, and significant access point density reductions, underscoring its practical benefits.
Inference-time ARQ refers to a paradigm in wireless communications where the feedback mechanism during automatic repeat request (ARQ) exploits high-dimensional, information-rich vector feedback in place of traditional 1-bit acknowledgments. This approach transforms feedback from passive acknowledgment to an active collaboration between user equipment (UE) and access point (AP), realized by neural-coded feedback tightly integrated with physical-layer channel coding. The Rich-ARQ scheme exemplifies inference-time ARQ, introducing neural-network based encoders and decoders, asynchronous feedback code design, and a full-stack radio prototype that decouples inference from strict radio timing to address coverage, reliability, and latency bottlenecks in next-generation wireless systems (Chen et al., 8 Feb 2026).
1. System Model and Channel Description
The Rich-ARQ system is deployed on a star-topology link (single UE to AP) layered with an OFDM physical layer. Let denote the information bits. The UE at forward round uses a neural encoder to generate the codeword (where is the number of symbols per PRB):
with the latest feedback received. The uplink channel for subcarrier is
where is the frequency-domain channel gain and (AWGN).
At the AP, a neural feedback decoder generates a high-dimensional downlink feedback vector :
where is the instantaneous received SNR estimate. The feedback channel for subcarrier is:
with and denoting downlink channel gain and AWGN, respectively (Chen et al., 8 Feb 2026).
2. Neural Network Architectures for Coding and Feedback
The UE encoder is a lightweight attention-based feedback code (AFC):
- Inputs at round : message bits , past codewords , received feedback , and SNR embedding .
- Input stacking: .
- Processing: is input to transformer-like layers (comprised of LayerNorm, single-head self-attention, Add&Norm, position-wise feed-forward, Add&Norm).
- Output: Linear mapping produces .
Typical parameterization: , , , , total parameters K.
The AP decoder is full-capacity, comprising transformer layers with multi-head attention (, , feed-forward dimensions 256), outputting for feedback, and a subsequent stack for final bit-probability decoding. Total decoder parameters are K. Both UE and AP employ SNR-conditioned embeddings and transformer architectures, with asymmetric complexity favoring lightweight on-device encoding (Chen et al., 8 Feb 2026).
3. Asynchronous Feedback Code Construction
Classical ARQ implementations are synchronous, requiring the UE's encoder to idle at round until feedback for has arrived, introducing stalls whenever there is feedback delay. Rich-ARQ’s AFC eliminates this bottleneck by allowing the encoder at round to utilize any feedback received up to ; formally, if
then
This architecture overlaps the forward and feedback pipelines, preventing encoder stalls. If the forward slot duration is and feedback slot is , synchronous total latency over rounds is
while the asynchronous AFC achieves
with effective forward interval (if ). In prototype settings (ms, ms, ), this design yields ms vs ms—a reduction of in total latency (Chen et al., 8 Feb 2026).
4. Joint Training Strategy and Loss Objectives
Rich-ARQ performs joint end-to-end training of encoder () and decoder () across randomized messages and channel realizations:
where the block error loss uses standard bit-wise cross-entropy:
Robustness across SNR regimes is encouraged by SNR-conditioned curriculum training with Langevin perturbations. Each mini-batch SNR is sampled from a convex mixture and . is annealed over the curriculum to enhance generalization, and perturbations emulate real SNR jitter (Chen et al., 8 Feb 2026).
5. Theoretical Performance and Coverage Gains
Let denote the SNR required by 1-bit HARQ to achieve packet error , and for Rich-ARQ. The observed gap is $8.8$ dB (Turbo-HARQ) and $9.5$ dB (Polar-HARQ) at .
Coverage improvements are derived via log-distance path-loss: , with maximum range scaling
and the Rich-ARQ coverage ratio is .
For and dB (Turbo), one obtains a coverage extension; for Polar, ( farther). This suggests that AP density can be reduced by (Turbo) or (Polar) for an equivalent error target (Chen et al., 8 Feb 2026).
6. Experimental Setup, Findings, and Comparative Metrics
A full-stack, standard-compliant prototype based on USRP X310 radios and an OFDM PHY (15 kHz spacing, 1 PRB = 12 subcarriers 1 ms) implements the Rich-ARQ framework. The experimental protocol comprises downlink synchronization, periodic uplink grants, and Rich-ARQ sessions over multiple rounds.
Key results include:
- At , Rich-ARQ requires $8.8$ dB ($9.5$ dB) less SNR than Turbo-HARQ (Polar-HARQ).
- Coverage ratio increases by /, implying / fewer APs required at target PER.
- Rich-ARQ exhibits monotonic robustness across SNRs, in contrast to previous deep learning-based feedback methods (e.g., GBAF fails away from training SNR).
- End-to-end latency reduction: from ms (synchronous) to ms (AFC), a improvement. Across various slot settings, AFC achieves lower "effective" forward intervals.
- Encoder model/compute: 21K parameters and 444K FLOPs (UE) vs. 40K/661K (AP); encoder runtime on GPU is $67.5$ ms (Rich-ARQ) versus $111$ ms (vanilla), for a speedup. Estimated FPGA inference times are $0.08$ ms (Kintex-7) or $0.56$ ms (Spartan-7) (Chen et al., 8 Feb 2026).
7. Practical Implementation and Deployment Considerations
Rich-ARQ uses a non-blocking, deadline-aware architecture:
- The PHY thread never blocks on encoder network inference. Upon an uplink PRB, it asynchronously requests encoder output; an empty slot is transmitted if not ready by the deadline.
- At the AP, feedback and final decoding are decoupled—feedback generation executes in a low-latency thread, while final decoding aggregates all forward receptions.
- Encoder complexity is minimized for on-device use via pruning, sparse feed-forward, and asymmetric model size.
- FPGA-friendly network designs (compact transformer/MLP layers) enable (sub-)ms inference latencies.
- Robustness to real-world channel/SNR fluctuations is secured by SNR-conditioned curricula with stochastic perturbation during training.
Rich-ARQ demonstrates compatibility with existing 4G/5G OFDM PHYs, showing that high-dimensional, neural-coded “rich” inference-time ARQ can deliver marked improvements in reliability, coverage, and latency, even under feedback delay, SNR variation, and practical hardware constraints (Chen et al., 8 Feb 2026).