Selective Retransmission Strategy

Updated 20 December 2025

Selective retransmission is an adaptive communication strategy that transmits only essential data based on uncertainty and redundancy in collaborative inference systems.
It leverages decision-theoretic and information-theoretic criteria, such as entropy thresholds and attention mechanisms, to dynamically trigger retransmission.
This approach minimizes communication overhead and energy consumption while maintaining robust inference accuracy in multi-device and resource-constrained environments.

Selective retransmission strategy refers to an adaptive communication mechanism in collaborative edge-to-server inference systems that dynamically determines whether additional data transmission is necessary based on task-relevant uncertainty or redundancy in intermediate features or decisions. Its primary goal is to minimize communication cost while preserving inference accuracy, notably in multi-device or resource-constrained settings. Selective retransmission mechanisms are grounded in decision-theoretic and information-theoretic criteria to identify essential visual content or bits for transmission, often leveraging model uncertainty, redundancy among devices, or attention-derived region-of-interest detection.

1. Background and Conceptual Foundations

Selective retransmission mechanisms appear prominently in collaborative inference architectures that seek to maximize task performance under bandwidth, latency, or energy constraints. Classic edge-server models transmit all candidate features blindly to the server, incurring unnecessary communication overhead, particularly when local predictions are already confident or cross-device feature codes are highly redundant. To address this, selective retransmission leverages uncertainty metrics (such as entropy or min-entropy of local predictions), attention or saliency mechanisms, or adaptive bit selection strategies to transmit only the indispensable information required by the server for improved accuracy or downstream decision-making (Song et al., 18 Dec 2025, Im et al., 2024, Shao et al., 2021).

2. Mathematical Formulation and Decision Criteria

The formal basis of selective retransmission is often couched in information-theoretic objectives, variational approximations, and entropy-thresholding rules. A prototypical pipeline is as follows:

Uncertainty quantification: For each inference, the server evaluates an uncertainty metric (e.g., min-entropy $H_{\text{m}} = -\log_2 \max_w p_\theta(w \mid \text{context})$ over output token probabilities) (Song et al., 18 Dec 2025, Im et al., 2024).
Retransmission decision: Transmission requests are triggered when $\overline{H}_\text{m} > \tau$ , $\tau$ being a developer-specified threshold tuned for the desired communication-accuracy trade-off.
Batch-selective retransmission in multi-device settings: The server aggregates codewords $\{u_k\}_{k=1}^{K}$ and computes a confidence score (e.g., the maximum softmax probability over the joint prediction). If below threshold, a sparse attention mechanism queries additional codewords or requests retransmission only from the least informative devices (Shao et al., 2021).

The canonical algorithm (as outlined in (Song et al., 18 Dec 2025)) is:

Local inference on downsampled/global input; server computes uncertainty over output sequence;
If confident, terminate transmission; else, identify the region of interest, request detail-preserving local input or essential features;
Fuse global and local inputs at the server for refined output.

3. Frameworks and Algorithms

Selective retransmission can be instantiated across modalities and model architectures:

Vision-LLMs (VLMs): The server runs inference on a downsampled image and, if uncertain (min-entropy above threshold), requests a high-resolution local image of the attention-derived RoI from the edge, fusing global and local tokens for refinement (Song et al., 18 Dec 2025).
Transformer-based collaborative inference: Clients transmit only the most salient patches based on attention scores and entropy thresholds; server-side decisions govern further patch requests (Im et al., 2024).
Distributed Information Bottleneck frameworks: Multiple edge devices extract compressed task-relevant features, transmit initial codewords, and support multi-round selective retransmission coordinated by server attention modules to eliminate cross-device redundancy (Shao et al., 2021).
Multi-view encoding and dynamic scheduling: Redundant or non-informative features across views are pruned, and only task-critical bits are requested for retransmission, optimizing rate-relevance trade-offs.

Table: Selective Retransmission Mechanisms

Setting	Uncertainty Metric	Retransmission Trigger
VLM edge-server (Song et al., 18 Dec 2025)	Min-entropy (avg over output tokens)	$H_{\text{m,avg}} > \tau$
ViT patch selection (Im et al., 2024)	Min-entropy over class probabilities	$H_{\min}(p) > \eta$
Distributed IB (Shao et al., 2021)	Confidence score (max softmax)	$\delta_\tau < \delta_0$
Multi-round attention (Shao et al., 2021)	Attention module (binary gating)	Attention score < threshold

4. Communication–Accuracy Trade-Off Analysis

Selective retransmission exhibits a fundamental communication-accuracy trade-off parameterized by the uncertainty threshold. Increasing the threshold reduces communication cost but may degrade accuracy if too many local predictions are accepted prematurely. These trade-offs are empirically quantified as piecewise-concave curves; for example, in vision-LLMs, selective retransmission achieves near-oracle accuracy at just 20–30% of the cost of unconditional retransmission (Song et al., 18 Dec 2025). In distributed IB frameworks, selective retransmission reduces average bit-cost by 10–15% compared to full retransmission at no loss of accuracy, pushing performance toward joint-coding upper bounds (Shao et al., 2021).

5. Extensions and Generalizations

Several extensions to basic selective retransmission have been proposed:

Adaptive multi-round strategies: Clients and server interact over multiple rounds, transmitting incrementally more features only as required by the server's predictive uncertainty (Im et al., 2024, Shao et al., 2021).
Cross-modal/generalized frameworks: Selective retransmission can be integrated with patch/token selection in audio, text, or multimodal inference, using attention or gradient-derived saliency (Im et al., 2024).
Server-side confidence-driven patch queries: The server can trigger sparse, targeted queries to optimize computation and bandwidth in resource-constrained collaborative settings.

6. Empirical Evaluation and Practical Guidelines

Experimental results consistently show substantial reductions in communication overhead with negligible accuracy loss under selective retransmission regimes:

VLMs: Achieve 59.5% accuracy at 28% retransmission cost, nearly matching 60.1% at full retransmission, amounting to ~72% communication savings (Song et al., 18 Dec 2025).
ViT edge inference: 68% reduction in communicated image data with only 1 percentage point loss in accuracy (Im et al., 2024).
Multi-device distributed IB: 10–15% bit-cost reduction and improved rate-relevance trade-off versus all baselines (Shao et al., 2021).

Practical recommendations include tuning the threshold to target desired accuracy and communication budgets, leveraging compression alongside retransmission to optimize bandwidth further, and integrating attention-guided cropping and transmission selection for maximal efficiency.

7. Context, Limitations, and Outlook

Although selective retransmission drastically improves resource efficiency, limitations include potential conservatism in entropy-thresholding, interoperability with black-box architectures, and real-time performance in multi-round protocols. Extensions such as learned gating networks, adaptive batch-selective mechanisms, and cross-modal deployments are active areas of research. The strategy’s effectiveness is grounded in rigorous probabilistic modeling and task-oriented information theory, and it is broadly applicable across semantic communications, multi-device collaborative inference, video analytics, and multimodal fusion tasks (Song et al., 18 Dec 2025, Im et al., 2024, Shao et al., 2021).

References:

"Collaborative Edge-to-Server Inference for Vision-LLMs" (Song et al., 18 Dec 2025)
"Attention-aware Semantic Communications for Collaborative Inference" (Im et al., 2024)
"Task-Oriented Communication for Multi-Device Cooperative Edge Inference" (Shao et al., 2021)

PDF Markdown Chat (Pro)

References (3)

Collaborative Edge-to-Server Inference for Vision-Language Models (2025)

Attention-aware Semantic Communications for Collaborative Inference (2024)

Task-Oriented Communication for Multi-Device Cooperative Edge Inference (2021)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Selective Retransmission Strategy.

Selective Retransmission Strategy

1. Background and Conceptual Foundations

2. Mathematical Formulation and Decision Criteria

3. Frameworks and Algorithms

4. Communication–Accuracy Trade-Off Analysis

5. Extensions and Generalizations

6. Empirical Evaluation and Practical Guidelines

7. Context, Limitations, and Outlook

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Selective Retransmission Strategy

1. Background and Conceptual Foundations

2. Mathematical Formulation and Decision Criteria

3. Frameworks and Algorithms

4. Communication–Accuracy Trade-Off Analysis

5. Extensions and Generalizations

6. Empirical Evaluation and Practical Guidelines

7. Context, Limitations, and Outlook

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research